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(54) Title: RISC MICROPROCESSOR ARCHITECTURE IMPLEMENTING MULTIPLE TYPED REGISTER SETS 


(57) Abstract 

A register system for a data processor which operates in a 
plurality of modes. The register system provides multiple, ident- 
ical banks of register sets, the data processor controlling access 
such that instructions and processes need not specify any given 
bank. An integer register set includes first (RA[23K)]) and second 
(RA131-.24D subsets, and a shadow subset (RTj31:24]). While the 
data processor is in a First mode, instructions access the first and 
second subsets. While the data processor is in a second mode, in- 
structions may access the first subset, but any attempts to access 
the second subset are re-routed to the shadow subset instead, 
transparently to the instructions, allowing system routines to 
seemingly use the second subset without having to save and res- 
tore data which user routines have written to the second subset. A 
re-typable register set provides integer whidth data and floating 
point width data in response to integer instructions and floating 
point instructions, respectively. Boolean comparison instructions 
specify particular integer or floating point registers for source da- 
ta to be compared, and specify a particular Boolean register for 
the result, so there are no dedicated, fixed-location status flags. 
Boolean combinational instructions combine specified Boolean 
registers for performing complex Boolean comparisons without 
intervening conditional branch instructions, to minimize pipeline 
disruption. 
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RISC MICROPROCESSOR ARCHITECTURE 
IMPLEMENTING MULTIPLE TYPED REGISTER SETS 


10 


45 CROSS-REFERENCE TO RELATED APPt.TCXTTOWS 

Applications of particular interest to the present 
application, include: 

1. HIGH-PERFORMANCE RISC MICROPROCESSOR ARCHITECTURE, 
SC/Serial Ho. 07/777.006 . .filed 08 July 199 1 by Le T. Nguyen et al; 
• 20 2. EXTENSIBLE RISC MICROPROCESSOR ARCHITECTURE, SC/Serial 

No. 07/727.058 . flled0 8 July 1991 by Le T. Nguyen et al; 
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3. RISC MICROPROCESSOR ARCHITECTURE WITH ISOLATED 
ARCHITECTURAL DEPENDENCIES , SC/Serial No. 0 7/726,744 , filed 

0 8 July 1991 by ^ T- Nguyen et a! ; 

4 . RISC MICROPROCESSOR ARCHITECTURE IMPLEMENTING FAST TRAP 
5 AND EXCEPTION STATE, SC/Serial No. 07/726 r 942 , filed 0 Q l^ l by 

Le T. Nguyen et al; 

5. SINGLE CHIP PAGE PRINTER CONTROLLER, SC/Serial No. 
07/726,929 § filea0 8 July 1991 by Derek- J. Lentz et al; 

6. MICROPROCESSOR ARCHITECTURE CAPABLE OF SUPPORTING 
10 MULTIPLE HETEROGENEOUS PROCESSORS, SC/Serial No. 07/726,893 filed 

0 8 July 1991 by Derek J. Lentz et al. 

The above-identified Applications are hereby incorporated 
herein by reference, their collective teachings being part of the 
present disclosure. 

IS BACKGROUND OP THE INVENT TON 

Field of the Invention 

The present invention relates generally to microprocessors, 
and more specif ically to a RISC microprocessor having plural, 
symmetrical sets of registers. 

20 Description of -the Background 

In addition to the usual complement of main memory storage 
and secondary permanent storage, a microprocessor-based computer 
system typically also includes one or more general purpose data 
registers, one or more address registers, and. one or more status 

25 flags. Previous systems have included integer registers for 
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holding integer data and floating point registers for holding 
floating point data. Typically, the status flags are used for 

» 

indicating certain conditions resulting from the most recently 
executed operation. There generally are status flags for 
5 indicating whether, in the previous operation: a carry occurred, 
a negative number resulted, and/or a zero resulted. 

These flags prove useful in determining the outcome of 
conditional branching within the flow of program control. For 
example, if it is desired to compare a first number to a second 

10 number and upon the conditions that the two are equal, to branch 
to a given subroutine, the microprocessor may compare the two 
numbers by subtracting one from the other, and setting or 
clearing the appropriate condition flags. The numerical value 
of the result of the subtraction need not be stored. A 

15 conditional branch instruction may then be executed, conditioned 
upon the status of the zero flag. While being simple to 
implement, this scheme lacks flexibility and power. Once the 
comparison has been performed, no further numerical or other 
operations may be performed before the conditional branch upon 

20 the appropriate flag; otherwise, the intervening instructions 
will overwrite the condition flag values resulting from the 
comparison, likely causing erroneous branching. The scheme is 
further complicated by the fact that it may be desirable to form 
greatly complex tests for branching, rather than the simple 

25 equality example given above. 

For example, assume that the program should branch to the 
subroutine only upon the condition that a first number is greater 
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than a second number, and a third number is less than a fourth 
number, and a fifth number is equal to a sixth number- It would 
be necessary for previous microprocessors to perform a lengthy 
series of comparisons heavily interspersed with conditional 
5 branches. A particularly undesirable feature of this serial 
scheme of comparing and branching is observed in any 
microprocessor having an instruction pipeline. 

In a pipelined microprocessor / more than one instruction is 
being executed at any given time, with the plural instructions 

10 being in different stages of execution at any given moment. This 
provides for vastly improved throughput. A typical pipeline 
microprocessor may include pipeline stages for: (a) fetching an 
instruction (b) decoding the instruction, (c) obtaining the 
instruction's operands, (d) executing the instruction, and 

15 (e) storing the results. The problem arises when a conditional 
branch instruction is fetched. It may be the case that the 
conditional branch's condition cannot yet be tested, as the 
operands may not yet be calculated, if they are to result from 
operations which are yet in the pipeline. This results in a 

20 "pipeline stall" , which dramatically slows down the processor. 

Another shortcoming of previous microprocessor-based systems 
is that they have included only a single set of registers of any 
given data type. In previous architectures, when an increased 
number of registers has been desired within a given data type, 

25 "the solution has been simply to increase the size of the single 
set of those type of registers. This may result in addressing 
problems, access conflict problems, and symmetry problems. 
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On a similar note, previous architectures have restricted 
each given register set to one respective numerical data type. 
Various prior systems have allowed general purpose fegisters to 
hold either numerical data or address "data", but the present 
5 application will not use the term •data" to include addresses. 
What is intended may be best understood with reference to two 
prior systems. The Intel 8085 microprocessor includes a register 
pair "HI*" which can be used to hold either two bytes of numerical 
data or one two-byte address. The present application's 

10 improvement is not directed to that issue. More on point , the 
Intel 80486 microprocessor includes a set of general purpose 
integer data registers and a set of floating point register's, 
with each set being limited to its respective data type, at least 
for purposes of direct register usage by arithmetic and logic 

15 units . 

This proves wasteful of the microprocessor' s resources , such 
as the available silicon area, when the microprocessor is 
performing operations which do not involve both data types* For 
example, user applications frequently involve exclusively integer 
20 operations, and perform no floating point operations whatsoever. 
When such a user application is run on a previous microprocessor 
which includes floating point registers (such as the 80486)/ 
those floating point registers remain idle during the entire 
execution. 

25 Another problem with previous microprocessor register set 

architecture is observed in context switching or state switching 
between a user application and a higher access privilege level 
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entity such as the operating system kernel. When control within 
the microprocessor switches context, mode, or state, the 
operating system kernel or other entity to which control is 
passed typically does not operate on the same data which the user 
5 application has been operating on. Thus, the data registers 
typically hold data values which are not useful to the new 
control entity but which must be maintained until the user 
application is resumed. The kernel must generally have registers 
for its own use, but typically has no way of knowing which 

10 registers are presently in use by the user application. In order 
to make space for its own data, the kernel must swap out or 
otherwise store the contents of a predetermined subset of the 
registers. This results in considerable loss of processing time 
to overhead, especially if the kernel makes repeated, short- 

15 duration assertions of control. 

On a related note, in prior microprocessors, when it is 
required that a ■ grand scale" context switch be made, it has 
been necessary for the microprocessor to expend even greater 
amounts of processing resources, including a generally large 

20 number of processing cycles, to save all data and state 
information before making the switch. When context is switched 
back, the same performance penalty has previously been paid, to 
restore the system to its former state. For example, if a 
microprocessor is executing two user applications, each of which 

25 requires the full complement of registers of each data type, and 
each of which may be in various stages of condition code setting 
•operations or numerical calculations, each switch from one user 
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application to the other necessarily involves swapping or 
otherwise saving the contents of every data register- and state 
flag in the system. This obviously involves a great deal of 
operational overhead , resulting in significant performance 
5 degradation, particularly if the main or the secondary storage 
to which the registers must be saved is significantly slower than 
the microprocessor itself. 

Therefore, we have discovered that it is desirable to have 
an improved microprocessor architecture which allows the various 
10 component conditions of a complex condition to be calculated 
without any intervening conditional branches. We have further 
discovered that It is desirable that the plural simple conditions 
be calculable in parallel, to improve throughput of the 
microprocessor. 

15 We have also discovered that it is desirable to have an 

architecture which allows multiple register sets within a given 
data type. 

Additionally, we have discovered it to be desirable for a 
microprocessor's floatingpoint registers to be usable as integer 

20 registers , in case the available integer registers are inadequate 
to optimally to hold the necessary amount of integer data. 
Notably, we have discovered that it is desirable that such 
re- typing be completely transparent to the user application. 

We have discovered it to be highly desirable to have a 

25 microprocessor which provides a dedicated subset of registers 
which are reserved for use by the kernel in lieu of at least a 
subset of the user registers, and that this new set of registers 
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should be addressable in exactly the same manner as the register 
subset which they replace , in order that the kernel may use the 
same register addressing scheme as user applications. We have 
further observed that it is desirable that the switch between the 
5 two subsets of registers require no microprocessor overhead 
cycles , in order to maximally utilize the microprocessor's 
resources . 

Also, we have discovered it to be desirable to have a 
microprocessor architecture which allows for a "grand scale" 

10 context switch to be performed with minimal overhead. In this 
vein, we have discovered that is desirable to have an 
architecture which allows for plural banks of register sets of 
each type, such that two or more user applications may be 
operating in a multi-tasking environment , or other •simultaneous" 

15 mode, with each user application having sole access to at least 
a full bank of registers. It is our discovery that the register 
addressing scheme should, desirably, not differ between user 
applications, nor between register banks, to maximize simplicity 
of the user applications, and that the system should provide 

20 hardware support, for switching between the register banks so that 
the user applications need not be aware of which register bank 
which they are presently using or even of the existence of other 
register banks or of other user applications. 

These and other advantages of our invention will be 

25 appreciated with reference to the following description of our 
invention, the accompanying drawings, and the claims. 
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SOMMARY OF THE INVENTION 
The present invention provides a register file system 
comprising: an integer register set including first and second 
subsets of integer registers , "and a shadow subset; a re-typable 
5 set of registers which are individually usable as integer 
registers or as floating point registers; and a set of 
individually addressable Boolean registers. 

The present invention includes integer and floating point 
functional units which execute integer instructions accessing the 

10 integer register set, and which operate in a plurality of modes. 
In any mode / instructions are granted ordinary access to the 
first subset of integer registers. In a first mode, instructions 
are also granted ordinary access to the second subset* However, 
in a second mode, instructions attempting to access the second 

IS subset are instead granted access to the shadow subset, in a 
manner which is transparent to the instructions. Thus, routines 
may be written without regard to which mode they will operate in, 
and system routines (which operate in the second mode) can have 
at least the second subset seemingly at their disposal, without 

20 having to expend the otherwise-required overhead of saving the 
second subset's contents (which may be in use by user processes 
operating in the first mode) * 

The invention further includes a plurality of integer 
register sets, which are individually addressable as specified 

25 by fields in instructions. The register sets include read ports 
and write ports which are accessed by multiplexers, wherein the 
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multiplexers aire controlled by contents of the register 
set-specifying fields in the instructions. 

One of the integer register sets is also usable as a 
floating point register set. In one embodiment, this set is 
5 sixty-four bits wide to hold double-precision floating point 
data, but only the* low order thirty-two bits are used by integer 
instructions . 

The invention includes functional units for performing 
Boolean operations, and further includes a Boolean register set 

10 for holding results of the Boolean operations such that no 
dedicated, fixed-location status flags are required. The integer 
and floating point functional units execute numerical comparison 
instructions, which specify individual ones of the Boolean 
registers to hold results of the comparisons. A Boolean 

15 functional unit executes Boolean combinational instructions whose 
sources and destination are specified registers in the Boolean 
register set. Thus, the present invention may perform 
conditional branches upon a single result of a complex Boolean 
function without intervening conditional branch instructions 

20 between the fundamental parts of the complex Boolean function, 
m i n imizing pipeline disruption in the data processor. 

Finally, there sure multiple, identical register banks in the 
system, each bank including the above-described register sets. 
A bank may be allocated to a given process or routine, such that 

25 the instructions within the routine need not specify upon which 
bank they operate. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of the instruction execution unit 
of the microprocessor of the present invention, showing the 
elements of the register file. 
5 Figs. 2-4 are simplified schematic and block diagrams of the 

floating point, integer and Boolean portions of the instruction 
execution unit of Fig, 1, respectively. 

Figs. 5-6 are more detailed views of the floating point and 
integer portions, respectively, showing the means for selecting 
10 between register sets. 

Fig. 7 illustrates the fields of an exemplary microprocessor 
instruction word executable by the instruction execution unit of 
Fig. 1. 

15 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

I. REGISTER FILE 

Fig. 1 illustrates the basic components of the instruction 
execution unit (IEU) 10 of the RISC (reduced instruction set 

20 computing) processor of the present invention. The XEU 10 
includes a register file 12 and an execution engine 14. The 
register file 12 includes one or more register banks 16-0 to 
16-n. It will be understood that the structure of each register 
bank 16 is identical to all of the other register banks 16. 

25 Therefore, the present application will describe only register 
bank 16-0. The register bank includes a register set A 18, a 
register set FB 20, and a register set C 22. 
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In general , the invention may be characterized as a RISC 
microprocessor having a register file optimally configured for 
use in the execution of RISC instructions, as opposed to 
conventional register files which are sufficient for use in the 
5 execution of CISC (complex instruction set computing) 
instructions by CISC processors- By having a specially adapted 
register file, the execution engine of the microprocessor's IEU 
achieves greatly improved performance, both in terms of resource 
utilization and in terms of raw throughput. The general concept 
10 is to tune a register set to a RISC instruction, while the 
specific implementation may involve' any of the register sets in 
the architecture. 

A. Register Set A 

15 Register set A 18 includes integer registers 24 (RA[31:0]), 

each of which is adapted to hold an integer value datum. In one 
embodiment, each integer may be thirty-two bits wide. The RA[] 
integer registers 24 include a first plurality 26 of integer 
registers (RA{23;0]) and a second plurality 28 of integer 

20 registers (RA[31:24l). The RA[] integer registers 24 axe each 
of identical structure, and are each addressable in the same 
manner, albeit with a unique address within the integer register 
set 24. For example, a first integer register 30 (RA[0]) is 
addressable at a zero offset within the integer register set 24. 

25 RA[0] always contains the value zero. It has been observed 

that user applications and other programs use the constant value 
zero more than any other constant value. It is, therefore. 
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desirable to have a zero readily available at all times, for 
clearing, comparing/ and other purposes. Another advantage of 
having a constant, hard-wired value in a given register, 
regardless of the particular value, is that the given register 
5 may be used as the destination of any instruction whose results 
need not be saved. 

Also, this means that the fixed register will never be the 
cause of a data dependency delay. A data dependency exists when 
a "slave* instruction requires, for one or more of its operands,, 
10 the result of a "master" instruction. In a pipelined processor, 
this may cause pipeline stalls. For example, the master 
instruction, although occurring earlier in the code sequence 
than the slave instruction, may take considerably longer to 
execute. It will be readily appreciated that if a slave 
15 "increment and store" instruction operates on the result data of 
a master " quadruple-word integer divide" instruction, the slave 
instruction will be fetched, decoded, and awaiting execution many 
clock cycles before the master instruction has finished 
execution. However, in certain instances, the numerical result 
20 of a master instruction is not needed, and the master instruction 
is executed for some other purpose only, such as to set condition 
code flags. If the master instruction's destination is RA[0], 
the numerical results will be effectively discarded. The data 
dependency checker (not shown) of the IEU 10 will not cause the 
25 slave instruction to be delayed, as the ultimate result of the 
master instruction — zero — is already known. 
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The integer register set A 24 also includes a set of shadow 
registers 32 (RT[31:24]). Each shadow register can hold an 
integer value, and is, in one embodiment, also thirty-two bits 
wide. Each shadow register is addressable as an offset in the 
5 same manner in which each integer register is addressable. 

Finally, the* register set A includes an IEU mode integer 
switch 34. The switch 34 , like other such elements, need not 
have a physical embodiment as a switch, so long as the 
corresponding logical functionality is provided within the 

10 register sets. The IED mode integer switch 34 is coupled to the 
first subset 26 of integer registers on line 36, to the second 
subset of integer registers 28 on line 38 , and to the shadow 
registers 32 on line 40. All accesses to the register set A 18 
are made through the IEU mode integer switch 34 on line 42. Any 

15 access request to read or write a register in the first subset 
RA[23:0] is passed automatically through the IEU mode integer 
switch 34. However, accesses to an integer register with an 
offset outside the first subset RA(23:0] will be directed either 
to the second subset RA[31:24] or the shadow registers RT[31;24], 

20 depending upon the operational mode of the execution engine 14. 

The IEU mode integer switch 34 is responsive to a mode 
control unit 44 in the execution engine 14. The mode control 
unit 44 provides pertinent state or mode information about the 
IEO 10 to the IEU mode integer switch 34 on line 46. When the 

25 execution engine performs a context switch such as a transfer 
to kernel mode, the mode control unit 44 controls the IEU mode 
integer switch 34 such that any requests to the second subset 
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RA131:24] are re-directed to the shadow RT131:24], using the same 
requested offset within the integer set. Any operating system 
kernel or other then-executing entity may thus have apparent 
access to the second subset RA[31:24l without the 
5 otherwise-required overhead of swapping the contents of the 
second subset RA(3'1:24] out to main memory, or pushing the second 
subset RA[31:24J onto a stack, or other conventional 
register- saving technique. 

When the execution engine 14 returns to normal user mode and 

10 control passes to the originally- executing user application, the 
mode control unit 44 controls the IEO mode integer switch 34 such 
that access is again directed to the second subset RA[3t:24). 
In one embodiment, the mode control unit 44 is responsive to the 
present state of interrupt enablement in the 1E0 10. In one 

15 embodiment, the execution engine 14 includes a processor status 
register (PSR) (not shown), which includes a one-bit flag 
(PSR17J) indicating whether interrupts are enabled or disabled. 
Thus, the line 46 may simply couple the IEO mode integer switch 
34 to the interrupts-enabled flag in the PSR. While interrupts 

20 are disabled, the IEO 10 maintains access to the integers 
RA(23:0l, in order that it may readily perform analysis of 
various data of the user application. This may allow improved 
debugging, error reporting, or system performance analysis.. 

25 B. Register Ssl- PR 

The re-typable register set PB 20 may be thought of as 
including floating point registers 48 (RP131:0J); and/or integer 
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registers 50 (RB[31:0]). When neither data type is implied to 
the exclusion of the other , this application will use the term 
RFBf ] . In one embodiment, the floating point registers RF [ ] 
occupy the same physical silicon space as the integer registers 
5 KB[]. In one embodiment , the floating point registers RF [ ] are 
sixty-four bits .wide and the integer registers RB(] are 
thirty-two bits wide. It will be understood that if 
double-precision floating point numbers are not required, the 
register set RFB[] may advantageously be constructed in a 

10 thirty-two-bit width to save the silicon area otherwise required 
by the extra thirty-two bits of each floating point register. 

Each individual register in the register set RFB [ ] may hold 
either a floating point value or an integer value. The register 
set RFB[ ] may include optional hardware for preventing accidental 

15 access of a floating point value as though it were an integer 
value, and vice versa. In one embodiment , however, in the 
interest of simplifying the register set RFB[], it is simply 
left to the software designer to ensure that no erroneous usages 
of individual registers are made. Thus, the execution engine 14 

20 simply makes an access request on line 52, specifying an offset 
into the register set RFB[] / without specifying whether the 
register at the given offset is intended to be used as a floating 
point register or an integer register. Within the execution 
engine 14, various entities may use either the full sixty-four 

25 - bits provided .by the register set RFB[], or may use only the low 
order thirty- two bits, such as in integer operations or 
single-precision floating point operations. 
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A first register RFB[0] 51 contains the constant value zero, 
in a form such that RB[0] is a thirty- two-bit integer zero 
(0000^^) and RF[0] is a sixty-four-bit floating point zero 
(OOOOQOO0 nex ) . This provides the same advantages as described 
5 above for RA[0] . 

c. Register Set C 

The register set C 22 includes a plurality of Boolean 
registers 54 (RC[31:0]). RC[J is also known as the "condition 
10 status register" (CSR). The Boolean registers RC[] are each 
identical in structure and addressing, albeit that each is 
individually addressable at a unique address or offset within 
RC[]. 

In one embodiment, register set C further includes a 
15 "previous condition status register" (PCSR) 60, and the register 
set C also includes a CSR selector unit 62, which is responsive 
to the mode control unit 44 to select alternatively between the 
CSR 54 and the PCSR 60. In the. one embodiment, the CSR is used 
when interrupts are enabled, and the PCSR is used when interrupts 
20 are disabled. The CSR and PCSR are identical in all other 
respects. In the one embodiment, when interrupts are set to be 
disabled, the CSR selector unit 62 pushes the contents of the CSR 
into the PCSR, overwriting the former contents of the PCSR, and 
when interrupts are re-enabled, the CSR selector unit 62 pops the 
25 contents of the PCSR back into the CSR. In other embodiments it 
may be desirable to merely alternate access between the CSR and 
the PCSR, as is done with RAf31:24] and RT[31:24}. In any event, 
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th e PCSR is always available as a thirty-two-bit "special 
register" . 

None of the Boolean registers is a dedicated condition flag, 
unlike the Boolean registers in previously known microprocessors. 
5 That is, the CSR 54 does not include a dedicated carry flag, nor 
a dedicated a minus flag, nor a dedicated flag indicating 
equality of a comparison or a zero subtraction result. Rather, 
any Boolean register may be the destination of the Boolean result 
of any Boolean operation. As with the other register sets, a 
10 first Boolean register 58 (RC[0J) always contains the value zero, 
to obtain the advantages explained above for-RAlO]. In the 
preferred embodiment/ each Boolean register is one bit wide, 
indicating one Boolean value. 

15 II. EXECUTION SPglNE 

The execution engine 14 includes one or more integer 
functional units 66, one or more floating point functional units 
68, and one or more Boolean functional units 70* The functional 
units execute instructions as will be explained below. Buses 72, 

20 73, ana 75 connect the various elements of the IEU 10, and will 
each be understood to represent data, address, and control paths. 

A. Instrvctic-n Format 

Fig. 7 illustrates one exemplary format for an integer 
25 instruction which the execution engine 14 may execute. It will 
be tinders tood that not all instructions need to adhere strictly 
to the illustrated format, and that the data processing system 
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includes an instruction fetcher and decoder (not shown) which are 
adapted to operate upon varying format instructions. The single 
example of Fig. 7 is for ease in explanation only. /Throughout 
this Application the identification I[] will be used to identify 
5 various bits of the instruction. 1 131: 30] are reserved for 
future implementations of the execution engine 14. I [29: 26] 
identify the instruction class of the particular instruction. 
Table 1 shows the various classes of instructions performed by 
the present invention. 

10 ; 

TABLE 1 
Instruction Classes 



Class 

Instructions 

15 

0-3 

Integer &nd floating point 
register-to-register instructions 


4 

Immediate constant load 


5 

Reserved 


6 

Load 

20 

7 

Store 


8-11 

Control Flow 


12 

Modifier 


13 

Boolean operations 


14 

Reserved 

25 

15 

Atomic (extended) 


Instruction classes of particular interest to this 
Application include the Class 0-3 register-to-register 
instructions and the Class 13 Boolean operations. While other 
30 classes of instructions also operate upon the register file 12, 
further discussion of those classes is not believed necessary in 
order to fully understand the present invention. 

I [25] is identified as B0, and indicates whether the 
destination register is in register set A or register set B. 
35 I [24: 22] are an opcode which identifies, within the given 


BNSOCCID; <WO S30tS43AlJ-> 


WO 93/01543 


PCT/US92/05720 


instruction class , which specific function is to be performed. 
For example, within the register-to-register classes an opcode 
may specify "addition". 1(21] identifies the addressing mode 
which is to be used when performing the instruction — either 
5 register source addressing or immediate source addressing. 
1(20:16] identify the destination register as an offset within 
the register set indicated by BO. 1115] is identified as Bl and 
indicates whether the first operand is to be taken from register 
set A or register set B. 1(14:10] identify the register offset 

10 from which the first operand is to be taken. 1(9:8] identify a 
function selection — an extension of the opcode 1(24:22]. 
1(7:6] are reserved. 1(5] is identified as B2 and indicates 
whether a second operand of the instruction is to be taken from 
register set A or register set B. Finally, 1(4:0] identify the 

15 register offset from which the second operand is to be taken. 

With reference to Fig. 1, the integer functional unit 66 and 
floating point functional unit 68 are equipped to perform integer 
comparison instructions and floating point comparisons, 
respectively. The instruction format for the comparison 

20 instruction is substantially identical to that shown in Fig. 7, 
with the caveat that various fields may advantageously be 
identified by slightly different names. 1(20:16] identifies the 
destination register where the result is to be stored, but the 
addressing mode field 1(21] does not select between register sets 

25 A or B. Rather, the addressing mode field indicates whether the 
second source of the comparison is found in a register or is 
immediate data. Because the comparison is a Boolean type 
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instruction, the destination register is always found in register 
set C. All other fields function as shown in Fig. 7, In 
performing Boolean operations within the integer and floating 
point functional units , the opcode and function select fields 

5 ' identify which Boolean condition is to be tested for in comparing 
the two operands. The integer and the floating point functional 
units fully support the IEEE standards for numerical comparisons. 

The IEU 10 is a load/store machine. This means that when 
the contents of a register are stored to memory or read from. 

10 memory, an address calculation must be performed in order to 
determine which location in memory is to be the source or the 
destination of the store or load, respectively. When this is 
the case, the destination register field 1(20:16] identifies the 
register which is the destination or the source of the load or 

IS store, respectively. The source register 1 field, 1(14: 10] , 
identifies a register in either set A or B which contains a base 
address of the memory location.. In one embodiment, the source 
register 2 field, 1(4:0], identifies a register in set A or set 
B which contains an index or an offset from the base. The 

20 load/store address is calculated by adding the index to the base. 
In another mode, 1(7:0] include immediate data which are to be 
added as an index to the base. 

B. Operation of the Instruction Execution Unit and 

25 Register Sets 

It will be understood by those skilled in the art that the 

integer functional unit 66, the floating point functional unit 
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68 , and the Boolean functional unit 70 are responsive to the 
contents of the instruction class field, the opcode field, and 
the function select field of a present instruction being 
executed. 

5 1. Integer Operations 

For example , when the instruction class , the opcode, and 
function select indicate that an integer register-to-register 
addition is to be performed, the integer functional unit may be 
responsive thereto to perform the indicated operation, while the 

10 floating point functional unit and the Boolean functional unit 
may be responsive thereto to not perform the operation. As will 
be understood from the cross-referenced applications, however, 
the floating point functional unit 68 is equipped to perform both 
floating point and integer operations. Also, the functional 

15 units are constructed to each perform more than one instruction 
simultaneously. 

The integer functional unit 66 performs integer functions 
only. Integer operations typically involve a first source, a 
second source, and a destination. A given integer instruction 

20 will specify a particular operation to be performed on one or 
more source operands and will specify that the result of the 
integer operation is to be stored at a given destination. In 
some instructions, such as address calculations employed in 
load/store operations, the sources are utilized as a base and 

25 an index. The integer functional unit 66 is coupled to a first 
bus 72 over which the integer functional unit 66 is connected -to 
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a switching and multiplexing control (SMC) unit A 74 and an SMC 
unit B 76. Each integer instruction executed by the integer 
functional unit 66 will specify whether each of its sources and 
destination reside in register set A or register set B. 
5 Suppose that the IEU 10 has received, from the instruction 

fetch unit {not shown) , an instruction to perform an integer 
register-to-register addition. In various embodiments, the 
instruction may specify a register bank, perhaps even a separate 
bank for each source and destination. In one embodiment, the 

10 instruction I[] is limited to a thirty-two-bit length, and does 
not contain any indication of which register bank 16-0 through 
16-n is involved in the instruction. Rather, the bank selector 
unit 78 controls which register bank is presently active. In 
one embodiment, the bank selector unit 78 is responsive to one 

15 or more bank selection bits in a status word (not shown) within 
the IEU 10. 

In order to perform the integer addition instruction, the 
integer functional unit 66 is responsive to the identification 
in 1(14:10] and 1(4:0] of the first and second source registers. 

20 The integer functional unit 66 places an identification of the 
first and second source registers at ports SI and S2, 
respectively, onto the integer functional unit bus 72 which is 
coupled to both SMC units A and B 74 and 76. In one embodiment, 
the SMC units A and B are each coupled to receive BO-2 from the 

25 instruction II]. In one embodiment/ a zero in any respective Bn 
indicates register set A, and a one indicates register set B. 
During load/store operations, the source ports of the integer 
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and floating point functional units 66 and 68 are utilized as a 
base port and an index port, B and I, respectively. 

After obtaining the first and second operands from the 
indicated register sets on the bus 72, as explained below, the 
5 integer functional unit 66 performs the indicated operation upon 
those operands , and provides the result at port D onto the 
integer functional unit bus 72, The SMC units A and B are 
responsive to BO to route the result to the appropriate register 
set A or B. 

10 The SMC unit B is further responsive to the instruction 

class / opcode, and function selection to control whether operands 
are read from (or results are stored to) either a floating point 
register RF[] or an integer register RB[ ] . As indicated , in one 
embodiment, the registers RF[] may be sixty-four bits wide while 

15 the registers are RB[] are only thirty- two bits wide. Thus, SMC 
unit B controls whether a word or a double word is written to the 
register set RFB [ ] . Because all registers within register set 
A are thirty- two bits wide, SMC unit A need not include means for 
controlling the width of data transfer on the bus 42. 

20 All data on the bus 42 are thirty-two bits wide, but other 

sorts of complexities exist within register set A. The IEU mode 
integer switch 34 is responsive to the mode control unit 44 of 
the execution engine 14 to control whether data on the bus 42 are 
connected through to bus 36, bus 38 or bus 40 , and vice versa, 

25 IEU mode integer switch 34 is further responsive to 

1120:16], I [14: 10], and I [4:0]. If a given indicated destination 
or source is in RA[23:0], the IEU mode integer switch 34 
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automatically couples the data between lines 42 and 36. However, 
for registers RA[31:24], the IEU mode integer switch 3 4 
determines whether data on line 42 is connected to line 38 or 
line 40, and vice versa. When interrupts are enabled, IEU mode 
5 integer switch 34 connects the SMC unit A to the second subset 
28 of integer registers RAt 31:24] . When interrupts are disabled / 
the IEU mode integer switch 34 connects the SMC unit A to the 
shadow registers RT[31:24). Thus, an instruction executing 
within the integer functional unit 66 need not be concerned with 
10 whether to address RA(31:24] or RT[31:24]. It will be understood 
that SMC unit A may advantageously operate identically whether 
it is being accessed by the integer functional unit 66 or by the 
floating point functional unit 68* 

15 2. Floating Point Operations 

The floating point functional unit 68 is responsive to the 
class, opcode, and function select fields of the instruction, to 
perform floating point operations. The SI, S2, and D ports 
operate as described for the integer functional unit 66. SMC 

20 unit B is responsive to retrieve floating point operands from, 
and to write numerical floating point results to, the floating 
point registers RF[] on bus 52 • 

3. Boolean Operations 
25 SMC unit C 80 is responsive to the instruction class, 

opcode, and function select fields of the instruction When 
SMC unit C detects that a comparison operation has been performed 
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by one of the numerical functional units 66 or 68, it writes the 
Boolean result over bus 56 to the Boolean register indicated at 
the D port of the functional unit which performed the comparison. 
The Boolean functional unit 70 does not perform comparison 
5 instructions as do the integer and floating point functional 
units 66 and 68. Rather , the Boolean functional unit 70 is only- 
used in performing bitwise logical combination of Boolean 
register contents , according to the Boolean functions listed in 
Table 2. 

10 

TABLE 2 
Boolean Functions 

TT23 .22.9.81 Boolean result calculation 


0000 

ZERO 

0001 

SI AND S2 

0010 

SI AND (NOT S2) 

0011 

SI 

0100 

(NOT SI) AND S2 

0101 

S2 

0110 

SI XOR S2 

0111 

SI OR S2 

1000 

SI NOR S2 

1001 

SI XNOR S2 

1010 

NOT S2 

1011 

SI OR (NOT S2) 

1100 

NOT SI 

1101 

(NOT SI) OR S2 

1110 

SI NAND S2 

1111 

ONE 


The advantage which the present invention obtains by having 
a plurality of homogenous Boolean registers , each of which is 
individually addressable as the destination of a Boolean 
. 35 operation # will be explained with reference to Tables 3-5- Table 
3 illustrates an example of a segment of code which performs a 
conditional branch based upon a complex Boolean function. The 
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complex Boolean function includes three portions which are OR-ed 
together. The first portion includes two sub-portions , which 
are AND-ed together. 


5 


TABLE 3 



Example of Complex Boolean Function 


1 

RA[ 1 1 : » 0; 


2 

IF (((RA[2] = RA|3)) AND (RM4] > RAl5])) OR 

10 

3 

(RA(6] < RA17J) OR 


4 

(RA18] <> RA( 9} ) ) THEN 


5 

X() 


6 

ELSE 


7 


15 

8 

RA[10] := 1; 


Table 4 illustrates , in pseudo-assembly form, one likely 
method by which previous microprocessors would perform "the 
20 function of Table 3* The code in Table 4 is written as 'though 
it were constructed by a compiler of at least normal intelligence 
operating upon the code of* Table 3. That is, the compiler will 
recognize that the condition expressed in lines 2-4 of Table 3 
is passed if any of the three portions is true. 
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TABLE 4 

Execution of Conplex Boolean Function 
Without Boolean Register Set 


1 
2 
3 
4 
5 
6 
7 
6 
9 

10 
11 
12 
13 


TEST2 CMP 
BLT 

TEST3 CMP 
BEQ 

DO_IF JSR 
JMP 

DO_ELSE JSR 
PAST ELSE LDI 


START LDI 
TEST1 CMP 


BNE 
CMP 
BGT 


RA(4],RA[5] 
DO_IF 

RA[6] ,RA[7] 
DOJEF 

RA(8] ,RA[9] 
DO_ELSE 
ADDRESS OF X( ) 
FAST_ELSE 
ADDRESS OF T() 
RA[10],1 


TEST2 


RA[1],0 
RA[2] / RA[3] 


The assignment at line 1 of Table 3 is performed by the 
■load immediate" statement at line 1 of Table 4. The first 
portion of the complex Boolean condition, expressed at line 2 
of Table 3, is represented by the statements in lines 2-5 of 
Table 4. To test whether RA{2] equals RA(3 ] , the compare 
statement at line 2 of Table 4 performs a subtraction of RA[2] 
from RA(3] or vice versa, depending upon tbe implementation, and 
may or may not store the result of that subtraction. The 
important function performed by the comparison statement is that 
the zero, minus, and carry flags will be appropriately set or 
cleared. 

The conditional branch statement at line 3 of Table 4 
branches to a subsequent portion of code upon the condition that 
RA[2] did not equal RA[3]. If the two were unequal, the zero 
flag will be clear, and there is no need to perform the second 
sub-portion. The existence of the conditional branch statement 
at line 3 of Table 4 prevents the further fetching, decoding, and 


<WO 9301543A1J_> 


WO 93/01543 


PCT/US92/05720 


-29- 

executing of any subsequent statement in Table 4 until the 
results of the comparison in line 2 are known, causing a pipeline 
stall. If the first sub-portion of the first portion (TEST1) is 
passed , the second sub-portion at line 4 of Table 4 then compares 
RA[4] to RA[5], again setting and clearing the appropriate status 
flags . 

If RA[2] equals RA[3], and RA[4] is greater than RA(5], 
there is no need to test the remaining two portions (TEST2 and 
TEST3) in the complex Boolean function, and the statement at 
Table 4, line 5, will conditionally branch to the label DO_IF, 
to perform the operation inside the •IF" of Table 3. However, 
if the first portion of the test is failed, additional processing 
is required to determine which of the "IF" and "ELSE" portions 
should be executed* 

The second portion of the Boolean function is the comparison 
of RAl 6] to RA(7], at line 6 of Table 4, which again sets and 
clears the appropriate status flags. If the condition -less 
than- is indicated by the status flags, the complex Boolean 
function is passed, and execution may immediately branch to the 
D0_IF label. In various prior microprocessors , the -less than- 
condition may be tested by examining the minus flag. If RA{7] 
was not less than RA[6], the third portion of the test must be 
performed. The statement at line 8 of Table 4 compares RA(8] to 
*M9I. If this comparison is failed, the -ELSE- code should be 
executed; otherwise, execution may simply fall through to the 
-IF- code at line 10 of Table 4, which is followed by an 
additional jump around the -ELSE- code. Each of the conditional 
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branches in Table 4, at lines 3, 5, 7 and 9, results in a 
separate pipeline stall, significantly increasing the processing 
time required for handling this complex Boolean function. 

The greatly improved throughput which results from employing 
5 the Boolean register set C of the present invention will now 
readily be seen with specific reference to Table 5. 


TABLE 5 

Execution of Complex Boolean Function 
10 With Boolean Register Set 


1 

START 

LDI 

RA[1] ,0 

2 

TEST1 

CMP 

RC[11J,RA[2],RA[3],EQ 

3 


CMP 

RC[12],RAl4],RA[5],GT 

4 

TEST2 

CMP 

RC[13],RA16],RA[7),LT 

5 

TEST3 

CMP 

RC[14],RA[8],RA[9],NE 

6 

COMPLEX 

AND 

RC[15] / RC[11] / RC(12] 

7 


OR 

RC[16],RC[ 13],RC{14] 

8 


OR 

RC[17J ,RC[15],RC[16] 

9 


BC 

RC(17],DO ELSE 

10 

D0_IF 

JSR 

ADDRESS OF X() 

11 


JMP 

PAST ELSE 

12 

DO ELSE 

JSR 

ADDRESS OF Y() 

13 

PASTJBLSE 

LDI 

RA[10],1 


25 

Most notably seen at lines 2-5 of Table 5, the Boolean 
register set C allows the microprocessor to perform the three 
test portions back- to-back without intervening branching. Each 
Boolean comparison specifies two operands , a destination , and a 
30 Boolean condition for which to test. For example, the comparison 
at line 2 of Table 5 compares the contents of RA[2] to the 
contents of RA[3], tests them for equality , and stores into 
RC[11] the Boolean value of the result of the comparison. Note 
that each comparison of the Boolean function stores its 
35 respective intermediate results in a separate Boolean register. 
As will be understood with reference to the above-referenced 
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related applications , the IEU 10 is capable of simultaneously 
performing more than one of the comparisons. 

After at least the first two comparisons at lines 2-3 of 
Table 5 have been completed, the two respective comparison 
5 results are AND-ed together as shown at line 6 of Table 3. 
RC[15] then holds .the result of the first portion of the test. 
The results of the second and third sub-portions of the Boolean 
function are OR-ed together as seen in Table 5, line 7. It will 
be understood that/ because there are no data dependencies 

10 involved, the AND at line 6 and the OR-ed in line 7 may be 
performed in parallel. Finally, the results of those two 
operations are OR-ed together as seen at line 8 of Table 5. It 
will be understood that register RC(17] will then contain a 
Boolean value indicating the truth or falsity of the entire 

15 complex Boolean function of Table 3. It is then possible to 
perform a single conditional branch., shown at line 9 of Table 5. 
In the mode shown in Table 5, the method branches to the "ELSE" 
code if Boolean register RC117] is clear, indicating that the 
complex function was failed. The remainder of the code may be 

20 the same as it was without the Boolean register set as seen in 
Table 4. 

The Boolean functional unit 70 is responsive to the 
instruction class , opcode, and function select fields as are the 
other functional units. Thus, it will be understood with 
25 reference to Table 5 again, that the integer and/or floating 
point functional units will perform the instructions in lines 1-5 
and 13, and the Boolean functional unit 70 will perform the 
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Boolean bitwise combination instructions ' in lines 6-8. The 
control flow and branching instructions in line 9-12 will be 
performed by elements of the IEU 10 which are not shown in 
Fig. 1. 

5 III. DATA PATHS 

Figs. 2-5 illustrate further details of the data paths 
within the floating point, integer , and Boolean portions of the 
IEU , respectively . 

10 A. Floating Point Portion Data Paths 

As seen in Fig. 2, the register set FB 20 is a multi -ported 
register set. In one embodiment, the register set FB 20 has two 
write ports WFBO-l, and five read ports RDFB0-4. The - floating 
point functional unit 68 of Fig. 1 is comprised of the AI*U2 102, 

15 FALU 104, MULT 106, and NULL 108 of Fig. 2. All elements of Fig. 
2 except the register set 20 and the elements 102-108 comprise 
the SMC unit B of Fig. 1. 

External, bidirectional data bus EX_DATA[] provides data to 
the floating point load/store unit 122. Immediate floating point 

20 data bus LDF_IMED[] provides data from a "load immediate" 
instruction. Other immediate floating point data are provided 
on busses RFF1_IMED and RFF2_IMED, such as is involved in an "add 
immediate" instruction. Data are also provided on bus 
EX_SR_DT{] , in response to a "special register move" instruction. 

25 Data may also arrive from the integer portion, shown in Fig. 3, 
on busses 114 and 120. 
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The floating point register set's tvo write ports WFBO and 
WFB1 are coupled to write multiplexers 110-0 and 110-1, 
respectively- The write multiplexers 110 receive data from: the 
ALDO or SHF0 of the integer portion of Fig. 3; the F ALU ; the 
5 MULT; the ALU2; either EXJSRJDTf} or LDF_IMED[]; and 
EX_DATA[]. Those skilled in the art will understand that control 
signals (not shown) determine which input is selected at each 
port, and address signals (not shown) determine to which register 
the input data are written* Multiplexer control and register 

10 addressing are within the skill of persons in the art, and will 
not be discussed for any multiplexer or register set in the 
present invention. 

The floating point register set's five read ports RDFB0 to 
RDFB4 are coupled to read multiplexers 112-0 to 112-4 , 

15 respectively. The read multiplexers each also receives data 
from: either EX - SR_DT[} or LDF_IMED[] , on load immediate bypass 
bus 126; a load external data bypass bus 127, which allows 
external load data to skip the register set FB; the output of 
the ALU2 102 , which performs non-multiplication integer 

20 operations; the FALU 104, which performs non-multiplication 
floating point operations; the MULT 106 , which performs 
multiplication operations; and either the ALU0 140 or the SHF0 
144 of the integer portion shown in Fig. 3, which respectively 
perform non-multiplication integer operations and shift 

25 operations. Read multiplexers 112-1 and 112-3 also receive data 
from RFF1_IMED[] and RFF2_IMED(1, respectively. 
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Each arithmetic- type unit 102-106 in the floating point 
portion receives two inputs, from respective sets of first and 
second source multiplexers SI and S2. The first source of each 
unit ALU2, FALU, and MULT comes from the output of either read 
5 multiplexer 112-0 or 112-2, and the second source comes from the 
output of either read multiplexer 112-1 or 112-3. The sources 
of the FALU and the MULT may also come from the integer portion 
of Fig. 3 on bus 114. 

The results of the ALU2, FALU, and MULT are provided back 

10 to the write multiplexers 110 for storage into the floating point 
registers RF[ ] , and also to the read multiplexers 112 for re-use 
as operands of subsequent operations. The FALU also outputs a 
signal FALU_BD indicating the Boolean result of a floating point 
comparison operation. FALU_BD is calculated directly from 

15 internal zero and sign flags of the FALU. 

Null byte tester NULL 108 performs null byte testing 
operations upon an operand from a first source multiplexer, in 
one mode that of the ALU2. NULL 108 outputs a Boolean signal 
NULLB_BD indicating whether the thirty- two-bit first source 

20 operand includes a byte of value zero. 

The outputs of read multiplexers 112-0, 112-1, and 112-4 are 
provided to the integer portion (of Fig. 3) on bus 118. The 
output of read multiplexer 112-4 is also provided as STDT_FP[] 
store data to the floating point load/store unit 122. 

25 Fig. 5 illustrates further details of the control of the 

SI and S2 multiplexers. As seen, in one embodiment, each SI 
multiplexer may be responsive to bit Bl of the instruction 
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and each S2 multiplexer may be responsive to bit B2 of the 
instruction I[J. The SI and S2 multiplexers select the sources 
for the various functional units . The sources may come from 
either of the register files, as controlled by the Bl and B2 bits 
5 of the instruction itself. Additionally, each register file 
includes two read ports from which the sources may come, as 
controlled by hardware not shown in the Figs. 

B. Tntecrer Portion Data Paths 
10 As seen in Fig. 3, the register set A 18 is also 

arulti -ported. In one embodiment/ the register set A 18 has two 
write ports WA0-1, and five read ports RDAO-4. The integer 
functional unit 66 of Fig* 1 is comprised of the ALUO 140,ALU1 
142, SHFO 144, and NULL 146 of Fig. 3. All elements of Fig*. 3 
IS except the register set 18 and the elements 140-146 comprise the 
SMC unit A of Fig. 1. 

External data bus EX_DATA( ] provides data to the integer 
load/ store unit 152. Immediate integer data on bus LDI_IMED[ ] 
are provided in response to a "load immediate" instruction. 
20 Other immediate integer data are provided on busses RFAl_IMED and 
RFA2_IMED in response to non-load immediate instructions, such 
as an "add immediate*. Data are also provided on bus EX_SR_DT{] 
in response to a "special register move" instruction. Data may 
also arrive from the floating point portion (shown in Fig. 2) 
25 on busses 116 and 118. 

The integer register set's two write ports WAO and WA1 are 
coupled to write multiplexers 148-0 and 148-1, respectively. The 
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write multiplexers 148 receive data from: the FALU or MULT of 
the floating point portion (of Fig, 2); the ALUO; the ALU1; 
the SHFO; either EX_SR_DT[] or LDI_IMED(]; and EX_DATA[ ] . 

The integer register set's five read ports RDAO to RDA4 are 
5 coupled to read multiplexers 150-0 to 150-4 , respectively. Each 
read multiplexer also receives data from: either EX_SR_DT[ ] or 
LDI_IMED[] on load immediate bypass bus 160; a load external 
data bypass bus 154 , which allows external load data to skip the 
register set A; ALUO; AJLU1; SHFO; and either the FALU or the 

10 MULT of the floating point portion (of Fig. 2). Read 
multiplexers 150-1 and 150-3 also receive data from RFA1_IM£D{] 
and RFA2_IM£D[], respectively. 

Each arithmetic -type unit 140-144 in the integer portion 
receives two inputs, from respective sets of first and second 

15 source multiplexers SI and S2. The first source of ALUO comes 
from either the output of read multiplexer 150-2, or a 
thirty-two-bit wide constant zero (0000 hex ), or floating point 
read multiplexer 112-4, The second source of ALUO comes from 
either read multiplexer 150-3 or floating point read multiplexer 

20 112*1. The first source of ALU1 comes from either read 
multiplexer 150-0 or IF_J?C[]. IF_PCU is used in calculating a 
return address needed by the instruction fetch unit (not shown) , 
due to the IEU's ability to perform instructions in an 
out-of-order sequence. The second source of ALU1 comes from 

25 either read multiplexer 150-1 or CF_OFFSET[] . CF_0FFSET[] is 
used in calculating a return address for a CALL instruction, also 
due to the out-of-order capability. 
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Th e first source of the shifter SHFO 144 is from either; 
floating point read multiplexer 112-0 or 112-4; or any integer 
read multiplexer 150. The second source of SHFO is from either: 
floating point read multiplexer 112-0 or 112-4; or integer read 

5 multiplexer -150-0, 150-2, or 150-4. SHFO takes a third input 
"from a shift amount multiplexer (SA) . The third input controls 
how far to shift, and is taken by the SA multiplexer from either: 
floating point read multiplexer 112-1; integer read multiplexer 
150-1 or 150-3; or a five-bit wide constant thirty-one (IIIII2 

10 or 31 10 ). The shifter SHFO requires a fourth input from the size 
multiplexer (S). The fourth input controls how much data to 
shift, and is taken by the S multiplexer from either: read 
multiplexer 150-1; read multiplexer 150-3; or a five-bit wide 
constant sixteen (10000 2 or l^io^- 

15 The results of the AX.O0, ALU1 , and SHFO are provided back 

to the write multiplexers 148 for storage into the integer 
registers KA( ] , and also to the read multiplexers 150 for re-use 
as operands of subsequent operations. The output of either ALU0 
or SHFO is provided on bus 120 to the floating point portion of 

20 Fig. 3. The ALU0 and ALU1 also output signals AI*U0_BD and 
ALU1_BD, respectively, indicating the Boolean results of integer 
comparison operations. ALOO^BD and ALU1JBD are calculated 
directly from the zero and sign flags of the respective 
functional units. ALU0 also outputs signals EXJEADR[ ] and 

25 EX_VM_ADR. EXJEADRl ] is the target address generated for an 
absolute branch instruction, and is sent to the IFO (not shown) 
for fetching the target instruction. EX_VM_ADR[ ] is the virtual 
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address used for all loads from memory and stores to memory, and 
is sent to the VMU (not shown) for address translation. 

Null byte tester NULL 146 performs null byte testing 
operations upon an operand from a first source multiplexer. In 
5 one embodiment, the operand is from the ALUO . NULL 146 outputs 
a Boolean signal NULLA_BD indicating whether the thirty- two-bit 
first source operand includes a byte of value zero. 

The outputs of read multiplexers 150-0 and 150-1 are 
provided to the floating point portion (of Fig. 2) on bus 114. 
10 The output of read multiplexer 150-4 is also provided as 
STDT_INT[J store data to the integer load/store unit 152. 

A control bit PSR[7] is provided to the register set A 18. 
It is this signal which, in Fig. 1, is provided from the mode 
control unit 44 to the IEU mode integer switch 34 on- line -46. 
15 The IE0 mode integer switch is internal to the register set A 18 
as shown in Fig. 3. 

Fig. 6 illustrates further details of the control of the SI 
and S2 multiplexers. The signal ALU0_BD 

20 c. Boolean portion Bafca Paths 

As seen in Fig. 4, the register set C 22 is also 
multi-ported. In one embodiment, the register set C 22 has two 
write ports WC0-1, and five read ports RDA0-4.^ All elements of 
Fig. 4 except the register set 22 and the Boolean combinational 
25 unit 70 comprise the SMC unit C of Fig. 1. 

The Boolean register set's two write ports WC0 and WC1 are 
coupled to write multiplexers 170-0 and 170- 1, respectively. The 
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write multiplexers 170 receive data from: the output of the 
Boolean combinational unit 70, indicating the Boolean result of 
a Boolean combinational operation; ALU0_BD from the integer 
portion of Fig, 3, indicating the Boolean result of an integer 
5 comparison; FALU_BD from the floating point portion of Fig. 2, 
indicating the Boolean result of a floating point comparison; 
either ALU1_BD_P from ALU1, indicating the results of a compare 
instruction in ALU1, or NULLA_BD from NULL 146 , indicating a null 
byte in the integer portion; and either ALU2JBD_P from AL02, 

10 indicating the results of a compare operation in ALU2, or 
NULLB_BD from NULL 108 , indicating a null byte in the floating 
point portion. In one mode, the ALU0_BD, ALU 1_BD, ALU2_BD, and 
FALU_BD signals are not taken from the data paths, but are 
calculated as a function of the zero flag, minus flag, carry 

IS flag, and other condition flags in the PSR. In one mode, wherein 
up to eight instructions may be executing at one instant in the 
IEU, the IEU maintains up to eight PSRs. 

The Boolean register set C is also coupled to- bus 
EX_SR_DTU , use with 'special register move" instructions. 

20 The CSR may be written or read as a whole, as though it were a 
single thirty-two-bit register. This enables rapid saving and 
restoration of machine state information, such as may be 
necessary upon certain drastic system errors or upon certain 
forms of grand scale context switching. 

25 The Boolean register set's five read ports RDC0 to RDC3 are 

coupled to read multiplexers 172-0 to 172-4, respectively. The 
read multiplexers 172 receive the same set of inputs as the write 
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multiplexers 170 receive. The Boolean combinational unit 70 
receives inputs from read multiplexers 170-0 and 170-1. Read 
multiplexers 172-2 and 172-3 respectively provide signals 
BLBP^CPORT and BLBP_DPORT. BLBP_CPORT is used as the basis for 
5 conditional branching instructions in the IEU. BLBP_DPORT is 
used in the "add with Boolean" instruction, which sets an integer 
register in the A or B set to zero or one (with leading zeroes), 
depending upon the content of a register in the C set. Read port 
RDC4 is presently unused, and is reserved for future enhancements 
10 of the Boolean functionality of the IE0. 


IV. CPNCLPSIQP 

While the features and advantages of the present invention 
have been described with respect to particular embodiments 
15 thereof, and in varying degrees of detail, it will be appreciated 
that the invention is not limited to the described embodiments. 
The following Claims define the invention to be afforded patent 
coverage. 
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CLAIMS 

We claim: 

1. An apparatus executing a set of instructions , the 
instructions including one or more fields, wherein a field of a 
given instruction specifies a source of an operand of the given 
instruction or a destination of a result of the given 

5 instruction, and wherein the apparatus comprises: 

processing means for executing the instructions; and 
a register file, coupled to the processing means , for 
storing operands and results of the instructions, wherein,. 

the register file includes a plurality of register 

10 sets, and 

the register file is responsive to one or more of the 
fields in a given instruction to retrieve an operand of the given 
instruction from, or store a result of the given instruction 
into, a given register in a given one of the register sets as 
15 identified by the one or more fields in the given instruction. 

2. The apparatus of Claim 1, wherein the instructions 
include Boolean combinational instructions each operating on on& 
or more Boolean operands to generate a Boolean result, each 
Boolean combinational instruction including one or more Boolean 

S fields specifying a location of each operand and result, and 
wherein: 
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the processing means includes Boolean execution means for 
executing the Boolean combinational instructions; 

the register file includes a Boolean register set of Boolean 
10 registers , each Boolean register for holding one of said Boolean 
operands or Boolean results; and 

the register file is responsive to each said Boolean field 
in a given Boolean combinational instruction independent of what 
Boolean combinational operation is specified by the given Boolean 
15 combinational instruction. 

3. The apparatus of Claim 2, wherein the instructions 
include Boolean comparison instructions each operating on one or 
more operands to generate a Boolean result, each Boolean 
comparison instruction including a Boolean result field 

5 specifying a location, in the Boolean register set, of the 
Boolean result, and wherein: 

the processing means includes comparison means for executing 
the Boolean comparison instructions; and 

the register file is responsive to the Boolean result field 
10 in a given Boolean instruction independent of what Boolean 
comparison operation is specified by the given Boolean comparison 
instruction. 

4. The apparatus of Claim 1, wherein the instructions 
include integer instructions each operating on one or more 
integer operands to generate an integer result, each integer 
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instruction including one or more integer fields specifying a 
5 location of each operand and result, and wherein: 

the processing means includes integer execution means for 
executing the integer instructions; and 

the register .file includes an integer register set of 
integer registers, each integer register for holding one of said 
10 integer operands or integer results. 

5. The apparatus of Claim 4, wherein the register file 
further comprises: 

a plurality of integer register sets. 

6. The apparatus of Claim 1, wherein the instructions 
include floating point instructions each operating on one or more 
floating point operands to generate a floating point result , each 
floating point instruction including one or more floating point 

5 fields specifying a location of each operand and result, and 
wherein: 

the processing means includes floating point execution means 
for executing the floating point instructions; and 

the register file includes a floating point register set 
10 of floating point registers/ each floating point register for 
holding one of said floating point operands or floating point 
results • 
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7. An apparatus coaprising: 

means for executing Boolean instructions , the Boolean 
instructions performing Boolean operations upon operands to 
generate Boolean results and each Boolean instruction indicating 
5 a destination for storage of the Boolean results of the Boolean 
instruction; 

a plurality of Boolean register means each for holding a 
Boolean value; and 

means , responsive to execution of a given Boolean 
10 instruction by said means for executing, for storing the given 
Boolean instruction's Boolean result into one of said Boolean 
register means , the one Boolean register means being indicated 
by said given Boolean instruction as the destination of its 
Boolean result. 

8. The apparatus of Claim 7, wherein the means for 
executing Boolean instructions comprises: 

numerical execution means for executing numerical comparison 
instructions to compare two multi-bit numerical operands and to 
45 accordingly produce a single-bit Boolean value result. 

9. The apparatus of Claim 8, wherein the numerical 
execution means comprises: 

integer execution means for comparing two multi-bit integer 
operands . 
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10. The apparatus of Claim 8, wherein the numerical 
execution means comprises: 

floating point execution means for comparing two multi-bit 
floating point operands. 

11. The apparatus of Claim 10 , wherein the numerical 
execution means further comprises: 

integer execution means for comparing two multi-bit integer 
operands . 

12. The apparatus of Claim 7, wherein the means for 
executing Boolean instructions comprises: 

Boolean execution means for executing Boolean combinational 
instructions to combine two Boolean value operands and -to 
5 accordingly produce a single-bit Boolean value result. 

13. The apparatus of Claim 12, wherein the means for 
executing Boolean instructions further comprises: 

numerical execution means for executing numerical comparison 
instructions to compare two multi-bit numerical operands and to 
5 accordingly produce a single-bit Boolean value result. 

14. The apparatus of Claim 13 , wherein the numerical 
execution means comprises: 

integer execution means for comparing two multi-bit integer 
operands; and 
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S floating point execution means for comparing two multi-bit 

floating point operands. 

15. The apparatus of Claim 7 further comprising: 
numerical register means for holding integer and floating 
point values; 

numerical execution means for executing numerical comparison 
5 instructions , wherein execution of each given numerical 
comparison instruction , 

i) retrieves two or more multi-bit numerical operands 
from respective numerical register means specified by the given 
numerical comparison instruction, 
10 ii) compares the two or more numerical operands 

according to a condition specified by the given numerical 
comparison instruction, 

iii) produces a first single-bit Boolean value result 
according to the condition, 
15 stores the first Boolean value result in a given 

one of said Boolean register means as specified by the given 
numerical comparison instruction, 

wherein the numerical execution means includes, 

i) integer execution means for comparing two multi-bit 
20 integer operands, and 

ii) floating point execution means for comparing two 
multi-bit floating point operands; and 
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Boolean execution means fox executing Boolean combinational 
instructions , wherein execution of each given Boolean 
25 combinational instruction 

i) retrieves one or more Boolean value operands from 
respective Boolean register means as specified by the given 
Boolean combinational instruction, 

ii) combines the one or more Boolean value operands 
30 according to an operation specified by the given Boolean 

combinational instruction , 

iii) produces a second single-bit Boolean value result 
according to the operation, and 

iv) stores the second Boolean value result in a given 
35 one of said Boolean register means as specified by the given 

Boolean combinational instruction. 

16. The apparatus of Claim 7, wherein: 

the plurality of Boolean register means includes , 

i) a first set of Boolean registers, and 

ii) a second set of Boolean registers;, and the 
5 apparatus further comprises 

means , coupled to the plurality of Boolean register means , 
for selecting the first or the second set of Boolean registers 
as a currently active set; and 

the means for storing is responsive to the means for 
10 selecting, to store results into Boolean registers in the 
currently active set only. 
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17. An apparatus for use with a data processing system, the 
data processing system including means for executing Boolean 
instructions, each Boolean instruction performing a given Boolean 
operation upon two or more operands to generate a one-bit Boolean 

5 result, the apparatus comprising: 

a Boolean register set including a plurality of individually 
addressable one-bit registers; and 

control means for writing the one-bit result of a given 
Boolean instruction into one of said one-bit registers, the one 
10 one-bit register being specified by the given Boolean 
instruction's contents. 

18. The apparatus of Claim 17, wherein the Boolean 
instructions include .Boolean combinational instructions, each 
Boolean combinational instruction specifying a Boolean operation 
to be performed upon a first and a second operand to generate the 

5 result, and specifying a first address of the first operand and 
a second address of the second operand and a third address of a 
destination for the result, wherein: 

the control means is further for reading the first and 
second operands from the Boolean register set at the first and 
10 second addresses, respectively, and wherein the one one-bit 
register is specified by the third address. 

19. The apparatus, of Claim 18, wherein the means for 
executing includes means for executing plural Boolean 
instructions in parallel, wherein there may exist, in the plural 
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10 


15 


20 
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ohange upon the control „eans writing another value to the 
Prespecified constant Boolean register; and 
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- eaahied to execute the slave instructions herore the result 
of the aaster instruction is generated. 


BNSOOCID: <WO„ 


9301543A1_L> 


WO 93/01543 PCI7US92/05720 


20. An apparatus comprising: 

execution means for executing instructions, the instructions 
performing operations upon operands to generate results, each 
instruction specifying a respective source address for each 
5 operand and a destination address for the result of the 
instruction, each address specifying a register set and an 
offset; 

a first register set including a plurality of individually 
addressable registers each for storing a value of a first data 
10 type; 

first access means for writing and reading values to and 
from the first register set according to a given instruction, the 
first access means including, 

i} first reading means, responsive to the- given 
15 instruction having a given source address which specifies the 
first register set as a source for an operand of the given 
instruction, for reading the operand's value from the first 
register set at the offset specified by the given source address, 
and 

20 ii) first writing means, responsive to the given 

instruction having a given destination address which specifies 
the first register set as a destination for the result of the 
given instruction, for writing the result's value to the first 
register set at the offset specified by the given destination 

25 address; 
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a second register set including a plurality of individually 
addressable registers each for storing a value of the first data 
type; and 

second access means for writing and reading values to and 
from the second register set according to the given instruction, 
the second access means including, 

i) second reading means, responsive to the given 
instruction having a given source address which specifies the 
second register set as a source for an operand of the given 
instruction, for reading the operand's value from the second 
register set at the offset specified by the given source address, 
and 

ii) second writing means, responsive to the given 
instruction having a given destination address which specifies 
the second register set as a destination for the result of the 
given instruction, for writing the result's value to the second 
register set at the offset specified by the given destination 
address. 

21. The apparatus of Claim 20, wherein: 

a given instruction may specify a first and a second source 
address and a destination address, with each address specifying 
either of the first or second register sets such that the given 
instruction requires access to both register sets; and 

the first and second access means operate simultaneously to 
provide the instruction parallel access to both the first and 
second register sets. 
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22. In a data processing system, which includes a central 
processing unit (CPU) which performs operations according to an 
instruction, the operations operating upon data of a first data 
type, a data register system comprising: 
5 a first register set including a plurality of first 

registers each for holding a datum of the first data type, and 
including means for accessing the first registers in response to 
the instruction; and 

a second register set including a plurality of second 
10 registers each for holding a datum of the first data type, and 
including means for accessing the second registers in response 
to the instruction. 


23. The data register system of Claim 22, wherein the 
instruction includes a field specifying which of the first and 
second register sets is to be accessed in response to the 
instruction, and wherein the data register system further 
5 comprises: 

means , responsive to the field, for accessing the first 
register set or the second register set as specified by the 
field. 
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24. An apparatus comprising: 

integer execution means for executing integer instructions, 
each integer instruction performing an integer operation upon one 
or more integer value operands and generating an integer value 
result; 

floating point execution means for executing floating point 
instructions, each floating point operation performing a floating 
point operation upon one or more floating point value operands 
and generating a floating point value result; 

vherein each instruction specifies one or more sources from 
vhich its one or more operands are zo be retrieved and further 
specifies a destination to which its result is to be stored, each 
operation also optionally specifying an integer value base and 
an integer value index; 
15 a register bank including, 

i) first register set means, having a plurality of 
first registers, for holding integer values and floating point 
values; 

access means, coupled to the first register set means and 
20 to both execution means, for, 

i) retrieving, from any one first register, an integer 
value operand for the integer execution means, a floating point 
value operand for the floating point execution means, or an 
integer value base or index for either execution means, as 

25 indicated by an instruction, and 

ii) for storing, into any one first register, an 
integer value result from the integer execution means or a 
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floating point value result from the floating point execution 
means , as indicated by an instruction. 

25. The apparatus of Claim 24, wherein: 

the register bank further comprises second register set 
means, having a plurality of second registers, for holding 
integer values; and 
5 the access means is further for, 

i) retrieving, from any one second register, an 
integer value operand for the integer execution means, or an 
integer value base or index for either execution means, as 
indicated by an instruction, and 
10 11 J fcr storing, into any one second, register, an 

integer value result from the integer execution means, as 
indicated by an instruction. 


26. The apparatus of Claim 25, further comprising: 
Boolean execution means for executing Boolean combinational 

instructions, each Boolean combinational instruction performing 

a Boolean combinational operation upon one or more Boolean value 

operands and generating a Boolean value result; 

the register bank further comprises third register set 

means, having a plurality of third registers, for holding Boolean 

values ; and 

the access means is further for, 
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10 iJ retrievi *9< from any one third register, a Boolean 

value operand for the Boolean execution means, as indicated by 
a Boolean combinational instruction, and 

ii) for storing, into any one third register, a 
Boolean value result from the Boolean execution 

15 indicated by a Boolean combinational instruction. 
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27. An apparatus, for use with a data processing system 
whica performs read operations and write operations upon data 
values of a first data type and a first data width and upon data 
values of a second data type and a second data width different 
5 than the first data width, the data processing system specifying 
a read address and data type for each read and a write address 
and data content for each write, the apparatus comprising: 

a register set including a plurality of individually 
addressable registers, each register being wide enough to hold 
10 a value of either data width; 

read access means, responsive to the data processing system 
performing a given read operation, for accessing the register set 
to retrieve data contents of a given register, which is 
individually addressed at the given read operation's specified 
15 read address, and for providing to the data processing system 
such portion of the retrieved data contents as the data type of 
the read operation specifies; and 

write access means, responsive to the data processing system 
performing a given write operation, for accessing the register 
20 set to store into a given register, which is individually 
addressed at the given write operation's specified write address, 
the data content specified by the write operation. 

28. The apparatus of Claim 27, wherein the first data type 
is floating point, the first data width is sixty-four bits, the 
second data type is integer, the second data width is thirty-two 
bits, and wherein: 
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5 the register set i, sixty-four bits «.de ; ,„d 

the read and write access „ea»s respectively retrieve end 
"ore si«y-four bits responsive to the data processing syste* 
perfor^ng rloatin, p „ in t operations, and thirty-two hits 
responsive to the date processing syste. performing integer 
10 operations . 
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29. An apparatus for use with a data processing system 
which executes instructions, each instruction performing 
operations upon one or more operands and generating a result , 
wherein each instruction specifies one or more sources from which 
5 its one or more operands are to be retrieved and further 
specifies a destination to which its result is to be stored, 
wherein the data processing system operates in a plurality of 
modes, the apparatus comprising: 

a plurality of first register means each for holding an 
10 operand or a result; 

a plurality of second register means each for holding an 
operand or a result; and 

switch means, responsive to the mode of the data processing 
system, for providing the data processing system access to only 
15 the plurality of first register means when the data processing 
system operates in a first mode, and for providing the data 
processing system access to only a first subset of the plurality 
of first register means and to the plurality of second register 
means when the data processing system operates in a second mode. 
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30. An apparatus including execution means for executing 
instructions, each instruction performing operations on one or 
more operands and generating a result, each instruction 
specifying one or more sources vhich are to be accessed to read 
5 its one or more operands and a destination which is to be 
accessed to write its result, the apparatus further comprising: 

a plurality of register banks; 

each register bank including a plurality of register means, 
each register means for storing an operand or a result, the 

10 plurality of register means within each register bank being 
arranged in a sequence such that any one given register means 
within a given register bank may be accessed as an offset into 
the given register bank, wherein the sources and the destination 
of a given instruction are specified as offsets; and 

15 register bank selector means for selecting a given register 

bank into which the given instruction's source and destination 
offsets are applied, the register bank selector means operating 
independently of any contents of the given instruction. 
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Cross-Reference to Related Applications 

Applications of particular interest to the present application, include: 
5 1. High-Performance RISC Microprocessor Architecture, 

SC/Serial No. 07/727,006, filed 08 July 1991 by Le T. Nguyen et al\ 

2. Extensible RISC Microprocessor Architecture, SC/Serial No. 
07/727,058, filed 08 July 1991 by Le T. Nguyen et al.\ 

3. RISC Microprocessor Architecture with Isolated 
10 Architectural Dependencies, SC/Seriai No. 07/726,744, filed 08 July 1991 

by Le T. Nguyen et al.\ 

4. RISC Microprocessor Architecture Implementing Fast Trap 
and Exception State, SC/Serial No. 07/726,942, filed 08 July 1991 by Le T. 
Nguyen et al.\ 

15 5. Single Chip Page Printer Controller. SC/Serial No. 

07/726,929, filed 08 July 1991 by Derek J. Una et al.\ 

6. Microprocessor Architecture Capable of Supporting Multiple 
Heterogeneous Processors, SC/Serial No. 07/726,893, filed 08 July 1991 by 
Derek J. Lentz et aL 

20 The above-identified Applications are hereby incorporated herein by 

reference, their collective teachings being pan of the present disclosure. 

Background of the Invention 

Field of the Invention 

The present invention relates generally to microprocessors, and more 
25 specifically to a RISC microprocessor having plural, symmetrical sets of 

registers. 
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Description of the Background 

In addition to the usual complement of main memory storage and 
secondary permanent storage, a microprocessor-based computer system 
typically also includes one or more general purpose data registers, one or 

S more address registers, and one or more status flags. Previous systems have 

included integer registers for holding integer data and floating point registers 
for holding floating point data. Typically, the status flags are used for 
indicating certain conditions resulting from the most recently executed 
operation. There generally are status flags for indicating whether, in the 

10 previous operation: a carry occurred, a negative number resulted, and/or a 

zero resulted. 

These flags prove useful in determining the outcome of conditional 
branching within the flow of program control. For example, if it is desired 
to compare a first number to a second number and upon the conditions that the 

15 two are equal, to branch to a given subroutine, the microprocessor may 

compare the two numbers by subtracting one from the other, and setting or 
clearing the appropriate condition flags. The numerical value of the result of 
the subtraction need not be stored. A conditional branch instruction may then 
be executed, conditioned upon the status of the zero flag. While being simple 

20 to implement, this scheme lacks flexibility and power. Once the comparison 

has been performed, no further numerical or other operations may be 
performed before the conditional branch upon the appropriate flag; otherwise, 
the intervening instructions will overwrite the condition flag values resulting 
from the comparison, likely causing erroneous branching. The scheme is 

25 further complicated by the fact that it may be desirable to form greatly 

complex tests for branching, rather than the simple equality example given 
above. 

For example, assume that the program should branch to the subroutine 
only upon the condition that a first number is greater than a second number, 
30 and a third number is less than a fourth number, and a fifth number is equal 
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to a sixth number. It would be necessary for previous microprocessors to 
perform a lengthy series of comparisons heavily interspersed with conditional 
branches. A particularly undesirable feature of this serial scheme of 
comparing and branching is observed in any microprocessor having an 
5 instruction pipeline. 

In a pipelined microprocessor, more than one instruction is being 
executed at any given time, with the plural instructions being in different 
stages of execution at any given moment. This provides for vastly improved 
throughput. A typical pipeline microprocessor may include pipeline stages 

10 for: (a) fetching an instruction, (b) decoding the instruction, (c) obtaining the 

instruction's operands, (d) executing the instruction, and (e) storing the results. 
The problem arises when a conditional branch instruction is fetched. It may 
be the case that the conditional branch's condition cannot yet be tested, as the 
operands may not yet be calculated, if they are to result from operations which 

15 are yet in the pipeline. This results in a "pipeline stall", which dramatically 

slows down the processor. 

Another shortcoming of previous microprocessor-based systems is that 
they have included only a single set of registers of any given data type. In 
previous architectures, when an increased number of registers has been desired 

20 within a given data type, the solution has been simply to increase the size of 

the single set of those type of registers. This may result in addressing 
problems, access conflict problems, and symmetry problems. 

On a similar note, previous architectures have restricted each given 
register set to one respective numerical data type. Various prior systems have 

25 allowed general purpose registers to hold either numerical data or address 

"data", but the present application will not use the term "data" to include 
addresses. What is intended may be best understood with reference to two 
prior systems. The Intel 8085 microprocessor includes a register pair "HL" 
which can be used to hold either two bytes of numerical data or one two-byte 

30 address. The present application's improvement is not directed to that issue. 

More on point, the Intel 80486 microprocessor includes a set of general 


SUBSTITUTE SHEET 


BNSDOCID <WO 93015A3A1JA> 


WO 93/01543 


PCT/US92/05720 


-4- 


purpose integer data registers and a set of floating point registers, with each 
set being limited to its respective data type, at least for purposes of direct 
register usage by arithmetic and logic units. 

This proves wasteful of the microprocessor's resources, such as the 
5 available silicon area, when the microprocessor is performing operations which 

do not involve both data types. For example, user applications frequently 
involve exclusively integer operations, and perform no floating point 
operations whatsoever. When such a user application is run on a previous 
microprocessor which includes floating point registers (such as the 80486), 

10 those floating point registers remain idle during the entire execution. 

Another problem with previous microprocessor register set architecture 
is observed in context switching or state switching between a user application 
and a higher access privilege level entity such as the operating system kernel. 
When control within the microprocessor switches context, mode, or state, the 

15 operating system kernel or other entity to which control is passed typically 

does not operate on the same data which the user application has been 
operating on. Thus, the data registers typically hold data values which are not 
useful to the new control entity but which must be maintained until the user 
application is resumed. The kernel must generally have registers for its own 

20 use. but typically has no way of knowing which registers are presently in use 

by the user application. In order to make space for its own data, the kernel 
must swap out or otherwise store the contents of a predetermined subset of the 
registers. This results in considerable loss of processing time to overhead, 
especially if the kernel makes repeated, short-duration assertions of control. 

25 On a related note, in prior microprocessors, when it is required that a 

"grand scale" context switch be made, it has been necessary for the 
microprocessor to expend even greater amounts of processing resources, 
including a generally large number of processing cycles, to save all data and 
state information before making the switch. When context is switched back, 

30 the same performance penalty has previously been paid, to restore the system 

to its former state. For example, if a microprocessor is executing two user 
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applications, each of which requires the full complement of registers of each 
data type, and each of which may be in various stages of condition code 
setting operations or numerical calculations, each switch from one user 
application to the other necessarily involves swapping or otherwise saving the 
5 contents of every data register and state flag in the system. This obviously 

involves a great deal of operational overhead, resulting in significant 
performance degradation, particularly if the main or the secondary storage to 
which the registers must be saved is significantly slower than the 
microprocessor itself. 

10 Therefore, we have discovered that it is desirable to have an improved 

microprocessor architecture which allows the various component conditions of 
a complex condition to be calculated without any intervening conditional 
branches. We have further discovered that it is desirable that the plural simple 
conditions be calculable in parallel, to improve throughput of the 

15 microprocessor. 

We have also discovered that it is desirable to have an architecture 
which allows multiple register sets within a given data type. 

Additionally, we have discovered it to be desirable for a 
microprocessor's floating point registers to be usable as integer registers, in 

20 case the available integer registers are inadequate to optimally to hold the 

necessary amount of integer data. Notably, we have discovered that it is 
desirable that such re-typing be completely transparent to the user application. 

We have discovered it to be highly desirable to have a microprocessor 
which provides a dedicated subset of registers which are reserved for use by 

25 the kernel in lieu of at least a subset of the user registers, and that this new set 

of registers should be addressable in exactly the same manner as the register 
subset which they replace, in order that the kernel may use the same register 
addressing scheme as user applications. We have further observed that it is 
desirable that the switch between the two subsets of registers require no 

30 microprocessor overhead cycles, in order to maximally utilize the 

microprocessor's resources. 
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Also, we have discovered it to be desirable to have a microprocessor 
architecture which allows for a "grand scale" context switch to be performed 
with minimal overhead. In this vein, we have discovered that is desirable to 
have an architecture which allows for plural banks of register sets of each 

5 type, such that two or more user applications may be operating in a 

multi-tasking environment, or other "simultaneous" mode, with each user 
application having sole access to at least a full bank of registers. It is our 
discovery that the register addressing scheme should, desirably, not differ 
between user applications, nor between register banks, to maximize simplicity 

10 of the user applications, and that the system should provide hardware support 

for switching between the register banks so that the user applications need not 
be aware of which register bank which they are presently using or even of the 
existence of other register banks or of other user applications. 

These and other advantages of our invention will be appreciated with 

15 reference to the following description of our invention, the accompanying 

drawings, and the claims. 


Summary of the Invention 


The present invention provides a register file system comprising: an 
integer register set including first and second subsets of integer registers, and 

20 a shadow subset; a re-typable set of registers which are individually usable as 

integer registers or as floating point registers; and a set of individually 
addressable Boolean registers. 

The present invention includes integer and floating point functional 
units which execute integer instructions accessing the integer register set, and 

25 which operate in a plurality of modes. In any mode, instructions are granted 

ordinary access to the first subset of integer registers. In a first mode, 
instructions are also granted ordinary access to the second subset. However, 
in a second mode, instructions attempting to access the second subset are 
instead granted access to the shadow subset, in a manner which is transparent 
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to the instructions. Thus, routines may be written without regard to which 
mode they will operate in, and system routines (which operate in the second 
mode) can have at least the second subset seemingly at their disposal, without 
having to expend the otherwise-required overhead of saving the second 
5 subset's contents (which may be in use by user processes operating in the first 

mode). 

The invention further includes a plurality of integer register sets, which 
are individually addressable as specified by fields in instructions. The register 
sets include read pons and write pons which are accessed by multiplexers, 

10 wherein the multiplexers are controlled by contents of the register 

set-specifying fields in the instructions. 

One of the integer register sets is also usable as a floating point register 
set. In one embodiment, this set is sixty-four bits wide to hold 
double-precision floating point data, but only the low order thiny-two bits are 

15 used by integer instructions. 

The invention includes functional units for performing Boolean 
operations, and further includes a Boolean register set for holding results of 
the Boolean operations such that no dedicated, fixed-location status flags are 
required. The integer and floating point functional units execute numerical 

20 comparison instructions, which specify individual ones of the Boolean registers 

to hold results of the comparisons. A Boolean functional unit executes 
Boolean combinational instructions whose sources and destination are specified 
registers in the Boolean register set. Thus, the present invention may perform 
conditional branches upon a single result of a complex Boolean function 

25 without intervening conditional branch instructions between the fundamental 

pans of the complex Boolean function, minimizing pipeline disruption in the 
data processor. 

Finally, there are multiple, identical register banks in the system, each 
bank including the above-described register sets. A bank may be allocated to 
30 a given process or routine, such that the instructions within the routine need 

not specify upon which bank they operate. 
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Brief Description of the Drawings 

Fig. 1 is a block diagram of the instruction execution unit of the 
microprocessor of the present invention, showing the elements of the register 
file. 

Figs. 2-4 are simplified schematic and block diagrams of the floating 
point, integer and Boolean portions of the instruction execution unit of Fig. 1, 
respectively. 

Figs. 5-6 are more detailed views of the floating point and integer 
portions, respectively, showing the means for selecting between register sets. 

Fig. 7 illustrates the fields of an exemplary microprocessor instruction 
word executable by the instruction execution unit of Fig. 1. 

Detailed Description of the Preferred Embodiments 
I. Register File 

Fig. 1 illustrates the basic components of the instruction execution unit 
(IEU) 10 of the RISC (reduced instruction set computing) processor of the 
present invention. The IEU 10 includes a register file 12 and an execution 
engine 14. The register file 12 includes one or more register banks 16-0 to 
16-n. It will be understood that the structure of each register bank 16 is 
identical to all of the other register banks 16. Therefore, the present 
application will describe only register bank 16-0. The register bank includes 
a register set A 18, a register set FB 20, and a register set C 22. 

In general, the invention may be characterized as a RISC 
microprocessor having a register file optimally configured for use in the 
execution of RISC instructions, as opposed to conventional register files which 
are sufficient for use in the execution of CISC (complex instruction set 
computing) instructions by CISC processors. By having a specially adapted 
register file, the execution engine of the microprocessor's IEU achieves greatly 
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improved performance, both in terms of resource utilization and in terms of 
raw throughput. The general concept is to tune a register set to a RISC 
instruction, while the specific implementation may involve any of the register 
sets in the architecture. 

5 A. Register Set A 

Register set A 18 includes integer registers 24 (RA[31:0]), each of 
which is adapted to hold an integer value datum. In one embodiment, each 
integer may be thirty-two bits wide. The RA[] integer registers 24 include a 
first plurality 26 of integer registers (RA[23:0]) and a second plurality 28 of 

10 integer registers (RAP 1:24]). The RA[] integer registers 24 are each of 

identical structure, and are each addressable in the same manner, albeit with 
a unique address within the integer register set 24. For example, a first 
integer register 30 (RA[0]) is addressable at a zero offset within the integer 
register set 24. 

15 RA[0] always contains the value zero. It has been observed that user 

applications and other programs use the constant value zero more than any 
other constant value. It is, therefore, desirable to have a zero readily available 
at all times, for clearing, comparing, and other purposes. Another advantage 
of having a constant, hard-wired value in a given register, regardless of the 

20 particular value, is that the given register may be used as the destination of 

any instruction whose results need not be saved. 

Also, this means that the fixed register will never be the cause of a 
data dependency delay. A data dependency exists when a "slave" instruction 
requires, for one or more of its operands, the result of a "master" instruction. 

25 In a pipelined processor, this may cause pipeline stalls. For example, the 

master instruction, although occurring earlier in the code sequence than the 
slave instruction, may take considerably longer to execute. It will be readily 
appreciated that if a slave "increment and store" instruction operates on the 
result data of a master "quadruple-word integer divide" instruction, the slave 

30 instruction will be fetched, decoded, and awaiting execution many clock cycles 
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before the master instruction has finished execution. However, in certain 
instances, the numerical result of a master instruction is not needed, and the 
master instruction is executed for some other purpose only, such as to set 
condition code flags. If the master instruction's destination is RA[0], the 
5 numerical results will be effectively discarded. The data dependency checker 

(not shown) of the 1EU 10 will not cause the slave instruction to be delayed, 
as the ultimate result of the master instruction « zero — is already known. 

The integer register set A 24 also includes a set of shadow registers 32 
(RTf31:24]). Each shadow register can hold an integer value, and is, in one 

10 embodiment, also thirty-two bits wide. Each shadow register is addressable 

as an offset in the same manner in which each integer register is addressable. 

Finally, the register set A includes an IEU mode integer switch 34. 
The switch 34, like other such elements, need not have a physical embodiment 
as a switch, so long as the corresponding logical functionality is provided 

15 within the register sets. The IEU mode integer switch 34 is coupled to the 

first subset 26 of integer registers on line 36, to the second subset of integer 
registers 28 on line 38, and to the shadow registers 32 on line 40. All 
accesses to the register set A 18 are made through the IEU mode integer 
switch 34 on line 42. Any access request to read or write a register in the 

20 first subset RA[23:0] is passed automatically through the IEU mode integer 

switch 34. However, accesses to an integer register with an offset outside the 
first subset RA[23:0] will be directed either to the second subset RA[31:24] 
or the shadow registers RT[31;24], depending upon the operational mode of 
the execution engine 14. 

25 The IEU mode integer switch 34 is responsive to a mode control unit 

44 in the execution engine 14. The mode control unit 44 provides pertinent 
state or mode information about the IEU 10 to the IEU mode integer switch 
34 on line 46. When the execution engine performs a context switch such as 
a transfer to kernel mode, the mode control unit 44 controls the IEU mode 

30 integer switch 34 such that any requests to the second subset RA[31:24] are 

re-directed to the shadow RT[31:24], using the same requested offset within 
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the integer set. Any operating system kernel or other then-executing entity 
may thus have apparent access to the second subset RA[31:24] without the 
otherwise-required overhead of swapping the contents of the second subset 
RA[31:24) out to main memory, or pushing the second subset RA[31:24] onto 
a stack, or other conventional register-saving technique. 

When the execution engine 14 returns to normal user mode and control 
passes to the originally-executing user application, the mode control unit 44 
controls the IEU mode integer switch 34 such that access is again directed to 
the second subset RA[31:24]. In one embodiment, the mode control unit 44 
is responsive to the present state of interrupt enablement in the IEU 10. In 
one embodiment, the execution engine 14 includes a processor status register 
(PSR) (not shown), which includes a one-bit flag (PSR[7]) indicating whether 
interrupts are enabled or disabled. Thus, the line 46 may simply couple the 
IEU mode integer switch 34 to the intemipts-enabled flag in the PSR. While 
15 interrupts are disabled, the IEU 10 maintains access to the integers RA[23:0J, 

in order that it may readily perform analysis of various data of the user 
application. This may allow improved debugging, error reporting, or system 
performance analysis. 


10 


20 


B. Register Set FB 

The re-typable register set FB 20 may be thought of as including 
floating point registers 48 (RF[31:0J), and/or integer registers 50 (RB[31:0]). 
When neither data type is implied to the exclusion of the other, this application 
will use the term RFB[]. In one embodiment, the floating point registers RF[] 
occupy the same physical silicon space as the integer registers RB[]. In one 
25 embodiment, the floating point registers RF[] are sixty-four bits wide and the 

integer registers RB[] are thirty-two bits wide. It will be understood that if 
double-precision floating point numbers are not required, the register set 
RFB[] may advantageously be constructed in a thirty-two-bit width to save the 
silicon area otherwise required by the extra thirty-two bits of each floating 
30 point register. 
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Each individual register in the register set RFB[] may hold either a 
floating point value or an integer value. The register set RFB[] may include 
optional hardware for preventing accidental access of a floating point value as 
though it were an integer value, and vice versa. In one embodiment, 
5 however, in the interest of simplifying the register set RFB[], it is simply left 

to the software designer to ensure that no erroneous usages of individual 
registers are made. Thus, the execution engine 14 simply makes an access 
request on line 52, specifying an offset into the register set RFB[], without 
specifying whether the register at the given offset is intended to be used as a 

10 floating point register or an integer register. Within the execution engine 14, 

various entities may use either the full sixty-four bits provided by the register 
set RFB[J, or may use only the low order thirty-two bits, such as in integer 
operations or single-precision floating point operations. 

A first register RFB[0] 51 contains the constant value zero, in a form 

15 such that RB[0] is a thirty-two-bit integer zero (0000^) and RF[0] is a 

sixty-four-bit floating point zero (00000000 hex ). This provides the same 
advantages as described above for RA[0]. 

C. Register Set C 

The register set C 22 includes a plurality of Boolean registers 54 
20 (RCf31:0J). RC[] isalso known as the "condition status register" (CSR). The 

Boolean registers RCfl are each identical in structure and addressing, albeit 
that each is individually addressable at a unique address or offset within RC[]. 

In one embodiment, register set C further includes a "previous 
condition status register" (PCSR) 60, and the register set C also includes a 
25 CSR selector unit 62, which is responsive to the mode control unit 44 to select 

alternatively between the CSR 54 and the PCSR 60. In the one embodiment, 
the CSR is used when interrupts are enabled, and the PCSR is used when 
interrupts are disabled. The CSR and PCSR are identical in all other respects. 
In the one embodiment, when interrupts are set to be disabled, the CSR 
30 selector unit 62 pushes the contents of the CSR into the PCSR, overwriting the 
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former contents of the PCSR, and when interrupts are re-enabled, the CSR 
selector unit 62 pops the contents of the PCSR back into the CSR. In other 
embodiments it may be desirable to merely alternate access between the CSR 
and the PCSR, as is done with RA[31:24] and RT[31:24]. In any event, the 

5 PCSR is always available as a thirty-two-bit "special register". 

None of the Boolean registers is a dedicated condition flag, unlike the 
Boolean registers in previously known microprocessors. That is, the CSR 54 
does not include a dedicated carry flag, nor a dedicated a minus flag, nor a 
dedicated flag indicating equality of a comparison or a zero subtraction result. 

10 Rather, any Boolean register may be the destination of the Boolean result of 

any Boolean operation. As with the other register sets, a first Boolean register 
58 (RC[0]) always contains the value zero, to obtain the advantages explained 
above for RAfO]. In the preferred embodiment, each Boolean register is one 
bit wide, indicating one Boolean value. 

15 //. Execution Engine 

The execution engine 14 includes one or more integer functional units 
66, one or more floating point functional units 68, and one or more Boolean 
functional units 70. The functional units execute instructions as will be 
explained below. Buses 72, 73 , and 75 connect the various elements of the 
20 IEU 10, and will each be understood to represent data, address, and control 

paths. 

A. Instruction Format 

Fig. 7 illustrates one exemplary format for an integer instruction which 
the execution engine 14 may execute. It will be understood that not all 
25 instructions need to adhere strictly to the illustrated format, and that the data 

processing system includes an instruction fetcher and decoder (not shown) 
which are adapted to operate upon varying format instructions. The single 
example of Fig. 7 is for ease in explanation only. Throughout this Application 
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the identification I[] will be used to identify various bits of the instruction. 
I[31:30] are reserved for future implementations of the execution engine 14. 
I[29:26] identify the instruction class of the particular instruction. Table 1 
shows the various classes of instructions performed by the present invention. 



Table 1 


Instruction Classes 

Class 

Instructions 

0-3 

Integer and floating point register-to-register instructions 

4 

Immediate constant load 

5 

Reserved 

6 

Load 

7 

Store 

8-11 

Control Flow 

12 

Modifier 

13 

Boolean operations 

14 

Reserved 

15 

Atomic (extended) 


Instruction classes of particular interest to this Application include the 
Class 0-3 register-to-register instructions and the Class 13 Boolean operations. 

20 While other classes of instructions also operate upon the register file 12, 

further discussion of those classes is not believed necessary in order to fully 
understand the present invention. 

1(25] is identified as BO, and indicates whether the destination register 
is in register set A or register set B. I[24:22] are an opcode which identifies, 

25 within the given instruction class, which specific function is to be performed. 

For example, within the register-to-register classes, an opcode may specify 
"addition". I[21] identifies the addressing mode which is to be used when 
performing the instruction - either register source addressing or immediate 
source addressing. I[20: 1 6] identify the destination register as an offset within 
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the register set indicated by BO. I[15] is identified as Bl and indicates 
whether the first operand is to be taken from register set A or register set B. 
I[14: 10] identify the register offset from which the first operand is to be taken. 
I[9:8] identify a function selection — an extension of the opcode I[24:22]. 
5 I[7:6] are reserved. I[5] is identified as B2 and indicates whether a second 

operand of the instruction is to be taken from register set A or register set B. 
Finally, I[4:0] identify the register offset from which the second operand is to 
be taken. 

With reference to Fig. 1, the integer functional unit 66 and floating 

10 point functional unit 68 are equipped to perform integer comparison 

instructions and floating point comparisons, respectively. The instruction 
format for the comparison instruction is substantially identical to that shown 
in Fig. 7, with the caveat that various fields may advantageously be identified 
by slightly different names. I[20: 16] identifies the destination register where 

IS the result is to be stored, but the addressing mode field I[21] does not select 

between register sets A or B. Rather, the addressing mode field indicates 
whether the second source of the comparison is found in a register or is 
immediate data. Because the comparison is a Boolean type instruction, the 
destination register is always found in register set C. All other fields function 

20 as shown in Fig. 7. In performing Boolean operations within the integer and 

floating point functional units, the opcode and function select fields identify 
which Boolean condition is to be tested for in comparing the two operands. 
The integer and the floating point functional units fully support the IEEE 
standards for numerical comparisons. 

25 The IEU 10 is a load/store machine. This means that when the 

contents of a register are stored to memory or read from memory, an address 
calculation must be performed in order to determine which location in memory 
is to be the source or the destination of the store or load, respectively. When 
this is the case, the destination register field If20: 16] identifies the register 

30 which is the destination or the source of the load or store, respectively. The 

source register 1 field, If 14: 10]. identifies a register in either set A or B which 

SUBSTITUTE SHEET 

BNSDOCID <WO 9301543A1_IA> 


WO 93/01543 


PCT/US92/05720 


-16- 


contains a base address of the memory location. In one embodiment, the 
source register 2 field, 1(4:0], identifies a register in set A or set B which 
contains an index or an offset from the base. The load/store address is 
calculated by adding the index to the base. In another mode, l[7:0] include 
5 immediate data which are to be added as an index to the base. 

B. Operation of the Instruction Execution Unit and Register Sets 
It will be understood by those skilled in the art that the integer 
functional unit 66, the floating point functional unit 68, and the Boolean 
functional unit 70 are responsive to the contents of the instruction class field, 
10 the opcode field, and the function select field of a present instruction being 

executed. 


/. Integer Operations 
For example, when the instruction class, the opcode, and function 
select indicate that an integer register-to-register addition is to be performed, 

15 the integer functional unit may be responsive thereto to perform the indicated 

operation, while the floating point functional unit and the Boolean functional 
unit may be responsive thereto to not perform the operation. As will be 
understood from the cross-referenced applications, however, the floating point 
functional unit 68 is equipped to perform both floating point and integer 

20 operations. Also, the functional units are constructed to each perform more 

than one instruction simultaneously. 

The integer functional unit 66 performs integer functions only. Integer 
operations typically involve a first source, a second source, and a destination. 
A given integer instruction will specify a particular operation to be performed 

25 on one or more source operands and will specify that the result of the integer 

operation is to be stored at a given destination. In some instructions, such as 
address calculations employed in load/store operations, the sources are utilized 
as a base and an index. The integer functional unit 66 is coupled to a first bus 
72 over which the integer functional unit 66 is connected to a switching and 
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multiplexing control (SMC) unit A 74 and an SMC unit B 76. Each integer 
instruction executed by the integer functional unit 66 will specify whether each 
of its sources and destination reside in register set A or register set B. 

Suppose that the IEU 10 has received, from the instruction fetch unit 
5 (not shown), an instruction to perform an integer register-to-register addition. 

In various embodiments, the instruction may specify a register bank, perhaps 
even a separate bank for each source and destination. In one embodiment, the 
instruction I[] is limited to a thirty-two-bit length, and does not contain any 
indication of which register bank 16-0 through 16-n is involved in the 
10 instruction. Rather, the bank selector unit 78 controls which register bank is 

presently active. In one embodiment, the bank selector unit 78 is responsive 
to one or more bank selection bits in a status word (not shown) within the IEU 
10. 

In order to perform the integer addition instruction, the integer 

15 functional unit 66 is responsive to the identification in I[ 14: 10] and I[4:0] of 

the first and second source registers. The integer functional unit 66 places an 
identification of the first and second source registers at ports SI and S2, 
respectively, onto the integer functional unit bus 72 which is coupled to both 
SMC units A and B 74 and 76. In one embodiment, the SMC units A and B 

20 are each coupled to receive BO-2 from the instruction In one embodiment, 

a zero in any respective Bn indicates register set A, and a one indicates 
register set B. During load/store operations, the source ports of the integer 
and floating point functional units 66 and 68 are utilized as a base port and an 
index port, B and I, respectively. 

25 After obtaining the first and second operands from the indicated 

register sets on the bus 72, as explained below, the integer functional unit 66 
performs the indicated operation upon those operands, and provides the result 
at port D onto the integer functional unit bus 72. The SMC units A and B are 
responsive to B0 to route the result to the appropriate register set A or B. 

30 The SMC unit B is further responsive to the instruction class, opcode, 

and function selection to control whether operands are read from (or results 
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are stored to) either a floating point register RF[] or an integer register RB[]. 
As indicated, in one embodiment, the registers RF[) may be sixty-four bits 
wide while the registers are RB[] are only thirty-two bits wide. Thus, SMC 
unit B controls whether a word or a double word is written to the register set 
5 RFB[]. Because all registers within register set A are thirty-two bits wide, 

SMC unit A need not include means for controlling the width of data transfer 
on the bus 42. 

All data on the bus 42 are thirty-two bits wide, but other sorts of 
complexities exist within register set A. The IEU mode integer switch 34 is 
10 responsive to the mode control unit 44 of the execution engine 14 to control 

whether data on the bus 42 are connected through to bus 36, bus 38 or bus 40, 
and vice versa. 

IEU mode integer switch 34 is further responsive to I[20:16], I[14:10], 
and I[4:0]. If a given indicated destination or source is in RA[23:0], the IEU 

15 mode integer switch 34 automatically couples the data between lines 42 and 

36. However, for registers RA[31:24], the IEU mode integer switch 34 
determines whether data on line 42 is connected to line 38 or line 40, and vice 
versa. When interrupts are enabled, IEU mode integer switch 34 connects the 
SMC unit A to the second subset 28 of integer registers RA[31:24}. When 

20 interrupts are disabled, the IEU mode integer switch 34 connects the SMC unit 

A to the shadow registers RT[31:24J. Thus, an instruction executing within 
the integer functional unit 66 need not be concerned with whether to address 
RA[31:24] or RT[31:24]. It will be understood that SMC unit A may 
advantageously operate identically whether it is being accessed by the integer 

25 functional unit 66 or by the floating point functional unit 68. 

2. Floating Point Operations 
The floating point functional unit 68 is responsive to the class, opcode, 
and function select fields of the instruction, to perform floating point 
operations. The SI, S2, and D ports operate as described for the integer 
30 functional unit 66. SMC unit B is responsive to retrieve floating point 
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operands from, and to write numerical floating point results io, the floating 
point registers RF[] on bus 52. 

3. Boolean Operations 
SMC unit C 80 is responsive to the instruction class, opcode, and 
5 function select fields of the instruction When SMC unit C detects that a 

comparison operation has been performed by one of the numerical functional 
units 66 or 68, it writes the Boolean result over bus 56 to the Boolean register 
indicated at the D port of the functional unit which performed the comparison. 
The Boolean functional unit 70 does not perform comparison 
10 instructions as do the integer and floating point functional units 66 and 68. 

Rather, the Boolean functional unit 70 is only used in performing bitwise 
logical combination of Boolean register contents, according to the Boolean 
functions listed in Table 2. 
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Table 2 

Boolean Functions 

I[23.22.9.8) 

Boolean result calculation 

0000 

ZERO 

0001 

SI AND S2 

0010 

SI AND (NOT S2) 

0011 

SI 

0100 

(NOT SI) AND S2 

0101 

S2 

0110 

SI XOR S2 

0111 

SI OR S2 

1000 

SI NOR S2 

1001 

SI XNOR S2 

1010 

NOT S2 

1011 

SI OR (NOT S2) 

1100 

NOT SI 

1101 

(NOT SI) OR S2 

1110 

SI NANDS2 

mi 

ONE 


The advantage which the present invention obtains by having a plurality 
of homogenous Boolean registers, each of which is individually addressable 
as the destination of a Boolean operation, will be explained with reference to 
Tables 3-5. Table 3 illustrates an example of a segment of code which 
performs a conditional branch based upon a complex Boolean function. The 
complex Boolean function includes three portions which are OR-ed together. 
The first portion includes two sub-portions, which are AND-ed together. 
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Table 3 

Example of Complex Boolean Function 


1 

RA[1] := 0; 


2 

IF (({RA[2] = RAI3J) AND <RA[4] > RA[5])) OR 

5 

3 

(RA[6) < RA[7] ) OR 


4 

(RA[8] <> RA[9] ) ) THEN 


5 

X() 


6 

ELSE 


7 

Y() ; 

10 

9 

RAflOl := 1: 


Table 4 illustrates, in pseudo-assembly form, one likely method by 
which previous microprocessors would perform the function of Table 3. The 
code in Table 4 is written as though it were constructed by a compiler of at 
least normal intelligence operating upon the code of Table 3. That is, the 
15 compiler will recognize that the condition expressed in lines 2-4 of Table 3 is 

passed if any of the three portions is true. 
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Table 4 

Execution of Complex Boolean Function 
Without Boolean Register Set 

1 START 

LDI 

RA[1] , 0 


2 TEST1 

CMP 

RA[2] ,RA[3] 


3 

BNE 

TEST2 


4 

CMP 

RA[4] ,RA[5) 


5 

BGT 

DO_IF 


6 TEST2 

CMP 

RA[6] f RA [7] 


7 

BLT 

DO_IF 


8 TEST3 

CMP 

RA[8] ,RA[9) 


9 

BEQ 

DO_ELSE 


10 DO_IF 

JSR 

ADDRESS OF X{) 


11 

JMP 

PAST_ELSE 


12 DO_ELSE 

JSR 

ADDRESS OF Y() 


13 PAST ELSE 

LDI 

RA f 10] ,1 



The assignment at line 1 of Table 3 is performed by the "load 
immediate" statement at line 1 of Table 4. The first portion of the complex 
Boolean condition, expressed at line 2 of Table 3, is represented by the 

20 statements in lines 2-5 of Table 4. To test whether RA[2] equals RA[3], the 

compare statement at line 2 of Table 4 performs a subtraction of RA[2] from 
RA[3] or vice versa, depending upon the implementation, and may or may not 
store the result of that subtraction. The important function performed by the 
comparison statement is that the zero, minus, and carry flags will be 

25 appropriately set or cleared. 

The conditional branch statement at line 3 of Table 4 branches to a 
subsequent portion of code upon the condition that RA[2] did not equal RA[3]. 
If the two were unequal, the zero flag will be clear, and there is no need to 
perform the second sub-portion. The existence of the conditional branch 

30 statement at line 3 of Table 4 prevents the further fetching, decoding, and 

executing of any subsequent statement in Table 4 until the results of the 
comparison in line 2 are known, causing a pipeline stall. If the first sub- 
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portion of the first portion (TEST1) is passed, the second sub-portion at line 
4 of Table 4 then compares RA[4] to RA[5], again setting and clearing the 
appropriate status flags. 

If RA[2] equals RA[3], and RA[4] is greater than RA[5], there is no 
5 need to test the remaining two ponions (TEST2 and TEST3) in the complex 

Boolean function, and the statement at Table 4, line 5, will conditionally 
branch to the label DOJF, to perform the operation inside the "IF" of Table 
3. However, if the first portion of the test is failed, additional processing is 
required to determine which of the "IF" and "ELSE" ponions should be 

10 executed. 

The second portion of the Boolean function is the comparison of RA[6] 
to RA[7], at line 6 of Table 4, which again sets and clears the appropriate 
status flags. If the condition "less than" is indicated by the status flags, the 
complex Boolean function is passed, and execution may immediately branch 

15 to the DOJF label. In various prior microprocessors, the "less than" 

condition may be tested by examining the minus flag. If RA[7] was not less 
than RA[6], the third portion of the test must be performed. The statement at 
line 8 of Table 4 compares RA[8] to RA[9]. If this comparison is failed, the 
"ELSE" code should be executed; otherwise, execution may simply fall 

20 through to the "IF" code at line 10 of Table 4, which is followed by an 

additional jump around the "ELSE" code. Each of the conditional branches 
in Table 4, at lines 3, 5, 7 and 9, results in a separate pipeline stall, 
significantly increasing the processing time required for handling this complex 
Boolean function. 

25 The greatly improved throughput which results from employing the 

Boolean register set C of the present invention will now readily be seen with 
specific reference to Table 5. 
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Table 5 

Execution of Complex Boolean Function 
With Boolean Register Set 

1 

START 

LDI 

RA[1] ,0 

2 

TEST1 

CMP 

RCfll] ,RA[2] ,RA[3] , EQ 

3 


CMP 

RC[12] ,RA[4] ,RA[5] ,GT 

4 

TEST2 

CMP 

RC[13] ,RA[S] ,RA[7] , LT 

5 

TEST3 

CMP 

RC[14] ,RA[8] ,RA19] ,NE 

6 

COMPLEX 

AND 

RC[15] .RCtll] ,RC[12] 

7 


OR 

RC[16] ,RC[13] ,RC[14) 

8 


OR 

RC[17J ,RC[15] ,RC[16] 

9 


BC 

RCU7] ,DO_ELSE 

10 

DO_IF 

JSR 

ADDRESS OF X() 

11 


JMP 

PAST_ELSE 

12 

DO_ELSE 

JSR 

ADDRESS OF YO 

ia 

PAST ETiSE 

T,m 

RMlfl] .1 


20 


25 


30 


Most notably seen at lines 2-5 of Table 5, the Boolean register set C 
allows the microprocessor to perform the three test portions back-to-back 
without intervening branching. Each Boolean comparison specifies two 
operands, a destination, and a Boolean condition for which to test. For 
example, the comparison at line 2 of Table 5 compares the contents of RA[2] 
to the contents of RA[3], tests them for equality, and stores into RC[11] the 
Boolean value of the result of the comparison. Note that each comparison of 
the Boolean function stores its respective intermediate results in a separate 
Boolean register. As will be understood with reference to the 
above-referenced related applications, the IEU 10 is capable of simultaneously 
performing more than one of the comparisons. 

After at least the first two comparisons at lines 2-3 of Table 5 have 
been completed, the two respective comparison results are AND-ed together 
as shown at line 6 of Table 3. RCf 15] then holds the result of the first portion 
of the test. The results of the second and third sub-portions of the Boolean 
function are OR-ed together as seen in Table 5, line 7. It will be understood 
that, because there are no data dependencies involved, the AND at line 6 and 
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the OR-ed in line 7 may be performed in parallel. Finally, the results of those 
two operations are OR-ed together as seen at line 8 of Table 5. 

It will be understood that register RC[17] will then contain a Boolean 
value indicating the truth or falsity of the entire complex Boolean function of 
5 Table 3. It is then possible to perform a single conditional branch, shown at 

line 9 of Table 5. In the mode shown in Table 5, the method branches to the 
"ELSE" code if Boolean register RC[17] is clear, indicating that the complex 
function was failed. The remainder of the code may be the same as it was 
without the Boolean register set as seen in Table 4. 

10 The Boolean functional unit 70 is responsive to the instruction class, 

opcode, and function select fields as are the other functional units. Thus, it 
will be understood with reference to Table 5 again, that the integer and/or 
floating point functional units will perform the instructions in lines 1-5 and 13, 
and the Boolean functional unit 70 will perform the Boolean bitwise 

15 combination instructions in lines 6-8. The control flow and branching 

instructions in line 9-12 will be performed by elements of the IEU 10 which 
are not shown in Fig. 1. 


///. Data Paths 


Figs. 2-5 illustrate further details of the data paths within the floating 
20 point, integer, and Boolean portions of the IEU, respectively. 

A. Floating Point Portion Data Paths 

As seen in Fig. 2, the register set FB 20 is a multi-ported register set. 
In one embodiment, the register set FB 20 has two write ports WFB0-1, and 
five read ports RDFB0-4. The floating point functional unit 68 of Fig. 1 is 
25 comprised of the ALU2 102, FALL) 104, MULT 106, and NULL 108 of Fig. 

2. All elements of Fig. 2 except the register set 20 and the elements 102-108 
comprise the SMC unit B of Fig. 1. 
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External, bidirectional data bus EX_DATAQ provides data to the 
floating point load/store unit 122. Immediate floating point data bus 
LDFJMEDfl provides data from a "load immediate" instruction. Other 
immediate floating point data are provided on busses RFFMMED and 
5 RFF2JMED, such as is involved in an "add immediate" instruction. Data are 

also provided on bus EXJSR_DT0, in response to a "special register move" 
instruction. Data may also arrive from the integer portion, shown in Fig. 3, 
on busses 114 and 120. 

The floating point register set's two write ports WFBO and WFB1 are 

10 coupled to write multiplexers 110-0 and 110-1, respectively. The write 

multiplexers 1 10 receive data from: the ALU0 or SHF0 of the integer portion 
of Fig. 3; the FALU; the MULT; the ALU2; either EX_SR_DTl] or 
LDFJMED[]; and EX_DATA[]. Those skilled in the art will understand that 
control signals (not shown) determine which input is selected at each port, and 

15 address signals (not shown) determine to which register the input data are 

written. Multiplexer control and register addressing are within the skill of 
persons in the art, and will not be discussed for any multiplexer or register set 
in the present invention. 

The floating point register set's five read ports RDFB0 to RDFB4 are 

20 coupled to read multiplexers 112-0 to 112-4, respectively. The read 

multiplexers each also receives data from: either EX_SR_DTQ or 
LDFJMED(], on load immediate bypass bus 126; a load external data bypass 
bus 127, which allows external load data to skip the register set FB; the 
output of the ALU2 102, which performs non-multiplication integer 

25 operations; the FALU 104, which performs non-multiplication floating point 

operations; the MULT 106, which performs multiplication operations; and 
either the ALU0 140 or the SHF0 144 of the integer portion shown in Fig. 3, 
which respectively perform non-multiplication integer operations and shift 
operations. Read multiplexers 112-1 and 112-3 also receive data from 

30 RFF1 JMED[] and RFF2JMED[], respectively. 
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Each arithmetic-type unit 102-106 in the floating point portion receives 
two inputs, from respective sets of first and second source multiplexers SI and 
S2. The first source of each unit ALU2, FALU, and MULT comes from the 
output of either read multiplexer 1 12-0 or 1 12-2, and the second source comes 
5 from the output of either read multiplexer 1 12-1 or 1 12-3. The sources of the 

FALU and the MULT may also come from the integer portion of Fig. 3 on 
bus 114. 

The results of the ALU2, FALU, and MULT are provided back to the 
write multiplexers 110 for storage into the floating point registers RF[], and 
10 also to the read multiplexers 112 for re-use as operands of subsequent 

operations. The FALU also outputs a signal FALU_BD indicating the 
Boolean result of a floating point comparison operation. FALU_BD is 
calculated directly from internal zero and sign flags of the FALU. 

Null byte tester NULL 108 performs null byte testing operations upon 
15 an operand from a first source multiplexer, in one mode that of the ALU2. 

NULL 108 outputs a Boolean signal NULLB_BD indicating whether the 
thirty-two-bit first source operand includes a byte of value zero. 

The outputs of read multiplexers 1 12-0, 1 12- 1 , and 1 12-4 are provided 
to the integer portion (of Fig. 3) on bus 118. The output of read multiplexer 
20 1 1 2-4 is also provided as STDT_FP[] store data to the floating point load/store 

unit 122. 

Fig. 5 illustrates further details of the control of the SI and S2 
multiplexers. As seen, in one embodiment, each SI multiplexer may be 
responsive to bit Bl of the instruction IQ f and each S2 multiplexer may be 

25 responsive to bit B2 of the instruction I[]. The SI and S2 multiplexers select 

the sources for the various functional units. The sources may come from 
either of the register files, as controlled by the Bl and B2 bits of the 
instruction itself. Additionally, each register file includes two read ports from 
which the sources may come, as controlled by hardware not shown in the 

30 Figs. 


BNSDOCID: <WQ 9301543A1_1A> 


SUBSTITUTE SHEET 


WO 93/01543 


PCI7US92/05720 


-28- 


B. Integer Portion Data Paths 

As seen in Fig. 3, the register set A 18 is also multi-ported. In one 
embodiment, the register set A 18 has two write ports WAO-1, and five read 
pons RDAO-4. The integer functional unit 66 of Fig. 1 is comprised of the 

5 ALUO 140, ALU1 142, SHFO 144, and NULL 146 of Fig. 3. All elements 

of Fig. 3 except the register set 18 and the elements 140-146 comprise the 
SMC unit A of Fig. 1 . 

External data bus EX_DATAQ provides data to the integer load/store 
unit 152. Immediate integer data on bus LDI_IMED[] are provided in 

10 response to a "load immediate" instruction. Other immediate integer data are 

provided on busses RFA1JMED and RFA2IMED in response to non-load 
immediate instructions, such as an "add immediate". Data are also provided 
on bus EX_SR_DT[] in response to a "special register move" instruction. 
Data may also arrive from the floating point portion (shown in Fig. 2) on 

15 busses 116 and 118. 

The integer register set's two write ports WAO and WA1 are coupled 
to write multiplexers 148-0 and 148-1, respectively. The write multiplexers 
148 receive data from: the FALU or MULT of the floating point portion (of 
Fig. 2); the ALUO; the ALU1; the SHFO; either EX_SR_DT[] or 

20 LDIJMEDQ; and EX_DATAfl. 

The integer register set's five read ports RDA0 to RDA4 are coupled 
to read multiplexers 150-0 to 150-4, respectively. Each read multiplexer also 
receives data from: either EX_SR_DT[] or LDIJMEDf] on load immediate 
bypass bus 160; a load external data bypass bus 154, which allows external 

25 load data to skip the register set A; ALUO; ALU1; SHFO; and either the 

FALU or the MULT of the floating point portion (of Fig. 2). Read 
multiplexers 150-1 and 150-3 also receive data from RFA1_IMED[] and 
RFA2JMED[], respectively. 

Each arithmetic-type unit 140-144 in the integer portion receives two 

30 inputs, from respective sets of first and second source multiplexers SI and S2. 

The first source of ALUO comes from either the output of read multiplexer 
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150-2, or a thirty-two-bit wide constant zero (0000 hex ), or floating point read 
multiplexer 112-4. The second source of ALUO comes from either read 
multiplexer 150-3 or floating point read multiplexer 112-1. The first source 
of ALU1 comes from either read multiplexer 150-0 or IF_PC[]. IF_PC[] is 
5 used in calculating a return address needed by the instruction fetch unit (not 

shown), due to the IEUs ability to perform instructions in an out-of-order 
sequence. The second source of ALU1 comes from either read multiplexer 
150-1 or CF_OFFSET[]. CF_OFFSETQ is used in calculating a return 
address for a CALL instruction, also due to the out-of-order capability. 

10 The first source of the shifter SHF0 144 is from either: floating point 

read multiplexer 112-0 or 112-4; or any integer read multiplexer 150. The 
second source of SHF0 is from either: floating point read multiplexer 1 12-0 
or 112-4; or integer read multiplexer 150-0, 150-2, or 150-4. SHF0 takes a 
third input from a shift amount multiplexer (SA). The third input controls 

15 how far to shift, and is taken by the SA multiplexer from either: floating 

point read multiplexer 112-1; integer read multiplexer 150-1 or 150-3; or a 
five-bit wide constant thirty-one (1 1 1 1 12 or 31 ]q). The shifter SHF0 requires 
a fourth input from the size multiplexer (S). The fourth input controls how 
much data to shift, and is taken by the S multiplexer from either: read 

20 multiplexer 150-1 ; read multiplexer 150-3; or a five-bit wide constant sixteen 

(IOOOO2 or 16 10 ). 

The results of the ALUO, ALU1, and SHF0 are provided back to the 
write multiplexers 148 for storage into the integer registers RA[], and also to 
the read multiplexers 150 for re-use as operands of subsequent operations. 

25 The output of either ALUO or SHF0 is provided on bus 120 to the floating 

point portion of Fig. 3. The ALUO and ALU 1 also output signals ALU0_BD 
and ALU1_BD, respectively, indicating the Boolean results of integer 
comparison operations. ALU0_BD and ALU1_BD are calculated directly 
from the zero and sign flags of the respective functional units. ALUO also 

30 outputs signals EX_TADR[] and EXJVM_ADR. EX_TADR[] is the target 

address generated for an absolute branch instruction, and is sent to the IFU 
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(not shown) for fetching the target instruction. EX_VM_ADR[] is the virtual 
address used for all loads from memory and stores to memory, and is sent to 
the VMU (not shown) for address translation. 

Null byte tester NULL 146 performs null byte testing operations upon 
5 an operand from a first source multiplexer. In one embodiment, the operand 

is from the ALUO. NULL 146 outputs a Boolean signal NULLA_BD 
indicating whether the thirty-two-bit first source operand includes a byte of 
value zero. 

The outputs of read multiplexers 150-0 and 150-1 are provided to the 
10 floating point portion (of Fig. 2) on bus 1 14. The output of read multiplexer 

150-4 is also provided as STDT_INT[] store data to the integer load/store unit 
152. 

A control bit PSR[7] is provided to the register set A 18. It is this 
signal which, in Fig. 1, is provided from the mode control unit 44 to the IEU 
15 mode integer switch 34 on line 46. The IEU mode integer switch is internal 

to the register set A 18 as shown in Fig. 3. 

Fig. 6 illustrates further details of the control of the SI and S2 
multiplexers. The signal ALU0_BD 

C. Boolean Portion Data Paths 

20 As seen in Fig. 4, the register set C 22 is also multi-ported. In one 

embodiment, the register set C 22 has two write ports WC0-1, and five read 
ports RDA0-4. All elements of Fig. 4 except the register set 22 and the 
Boolean combinational unit 70 comprise the SMC unit C of Fig. 1. 

The Boolean register set's two write ports WC0 and WC1 are coupled 

25 to write multiplexers 170-0 and 170-1, respectively. The write multiplexers 

170 receive data from: the output of the Boolean combinational unit 70, 
indicating the Boolean result of a Boolean combinational operation; 
ALU0_BD from the integer portion of Fig. 3, indicating the Boolean result of 
an integer comparison; FALU_BD from the floating point portion of Fig. 2, 

30 indicating the Boolean result of a floating point comparison; either 
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ALU1_BD_P from ALU1, indicating the results of a compare instruction in 
ALU1, or NULLABD from NULL 146, indicating a null byte in the integer 
portion; and either ALU2JJDP from ALU2, indicating the results of a 
compare operation in ALU2, or NULLBJ1D from NULL 108, indicating a 

5 null byte in the floating point portion. In one mode, the ALUOJJD, 

ALU1_BD, ALU2_BD, and FALU_BD signals are not taken from the data 
paths, but are calculated as a function of the zero flag, minus flag, carry flag, 
and other condition flags in the PSR. In one mode, wherein up to eight 
instructions may be executing at one instant in the IEU, the IEU maintains up 

10 to eight PSRs. 

The Boolean register set C is also coupled to bus EX_SR_DT[], for use 
with "special register move" instructions. The CSR may be written or read 
as a whole, as though it were a single thirty- two-bit register. This enables 
rapid saving and restoration of machine state information, such as may be 

15 necessary upon certain drastic system errors or upon certain forms of grand 

scale context switching. 

The Boolean register set's five read ports RDCO to RDC3 are coupled 
to read multiplexers 172-0 to 172-4, respectively. The read multiplexers 172 
receive the same set of inputs as the write multiplexers 170 receive. The 

20 Boolean combinational unit 70 receives inputs from read multiplexers 170-0 

and 170-1. Read multiplexers 172-2 and 172-3 respectively provide signals 
BLBP_CPORT and BLBP_DPORT. BLBP_CPORT is used as the basis for 
conditional branching instructions in the IEU. BLBP_DPORT is used in the 
"add with Boolean" instruction, which sets an integer register in the A or B 

25 set to zero or one (with leading zeroes), depending upon the content of a 

register in the C set. Read port RDC4 is presently unused, and is reserved for 
future enhancements of the Boolean functionality of the IEU . 
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TV. Conclusion 

While the features and advantages of the present invention have been 
described with respect to particular embodiments thereof, and in varying 
degrees of detail, it will be appreciated that the invention is not limited to the 
described embodiments. The following Claims define the invention to be 
afforded patent coverage. 
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We claim: 

1- An apparatus executing a set of instructions, the instructions 
including one or more fields, wherein a field of a given instruction specifies 
5 a source of an operand of the given instruction or a destination of a result of 

the given instruction, and wherein the apparatus comprises: 
processing means for executing the instructions; and 
a register file, coupled to the processing means, for storing operands 
and results of the instructions, wherein, 
10 the register file includes a plurality of register sets, and 

the register file is responsive to one or more of the fields in a 
given instruction to retrieve an operand of the given instruction from, or store 
a result of the given instruction into, a given register in a given one of the 
register sets as identified by the one or more fields in the given instruction. 

15 2. The apparatus of Claim 1, wherein the instructions include 

Boolean combinational instructions each operating on one or more Boolean 
operands to generate a Boolean result, each Boolean combinational instruction 
including one or more Boolean fields specifying a location of each operand 
and result, and wherein: 

20 the processing means includes Boolean execution means for executing 

the Boolean combinational instructions; 

the register file includes a Boolean register set of Boolean registers, 
each Boolean register for holding one of said Boolean operands or Boolean 
results; and 

25 the register file is responsive to each said Boolean field in a given 

Boolean combinational instruction independent of what Boolean combinational 
operation is specified by the given Boolean combinational instruction. 
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3. The apparatus of Claim 2, wherein the instructions include 
Boolean comparison instructions each operating on one or more operands to 
generate a Boolean result, each Boolean comparison instruction including a 
Boolean result field specifying a location, in the Boolean register set, of the 

5 Boolean result, and wherein: 

the processing means includes comparison means for executing the 
Boolean comparison instructions; and 

the register file is responsive to the Boolean result field in a given 
Boolean instruction independent of what Boolean comparison operation is 
10 specified by the given Boolean comparison instruction. 

4. The apparatus of Claim 1, wherein the instructions include 
integer instructions each operating on one or more integer operands to 
generate an integer result, each integer instruction including one or more 
integer fields specifying a location of each operand and result, and wherein: 

15 the processing means includes integer execution means for executing 

the integer instructions; and 

the register file includes an integer register set of integer registers, each 
integer register for holding one of said integer operands or integer results. 

5. The apparatus of Claim 4, wherein the register file further 
20 comprises: 

a plurality of integer register sets. 

6. The apparatus of Claim 1, wherein the instructions include 
floating point instructions each operating on one or more floating point 
operands to generate a floating point result, each floating point instruction 

25 including one or more floating point fields specifying a location of each 

operand and result, and wherein: 

the processing means includes floating point execution means for 
executing the floating point instructions; and 
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the register file includes a floating point register set of floating point 
registers, each floating point register for holding one of said floating point 
operands or floating point results. 

7. An apparatus comprising: 
5 means for executing Boolean instructions, the Boolean instructions 

performing Boolean operations upon operands to generate Boolean results and 
each Boolean instruction indicating a destination for storage of the Boolean 
results of the Boolean instruction; 

a plurality of Boolean register means each for holding a Boolean value; 

10 and 

means, responsive to execution of a given Boolean instruction by said 
means for executing, for storing the given Boolean instruction's Boolean result 
into one of said Boolean register means, the one Boolean register means being 
indicated by said given Boolean instruction as the destination of its Boolean 
15 result. 


8. The apparatus of Claim 7, wherein the means for executing 
Boolean instructions comprises: 

numerical execution means for executing numerical comparison 
instructions to compare two multi-bit numerical operands and to accordingly 
20 produce a single-bit Boolean value result. 

9. The apparatus of Claim 8, wherein the numerical execution 
means comprises: 

integer execution means for comparing two multi-bit integer operands. 

10. The apparatus of Claim 8, wherein the numerical execution 
25 means comprises: 

floating point execution means for comparing two multi-bit floating 
point operands. 
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11. The apparatus of Claim 10, wherein the numerical execution 
means further comprises: 

integer execution means for comparing two multi-bit integer operands. 

12. The apparatus of Claim 7, wherein the means for executing 
5 Boolean instructions comprises: 

Boolean execution means for executing Boolean combinational 
instructions to combine two Boolean value operands and to accordingly 
produce a single-bit Boolean value result. 

13. The apparatus of Claim 12, wherein the means for executing 
10 Boolean instructions further comprises: 

numerical execution means for executing numerical comparison 
instructions to compare two multi-bit numerical operands and to accordingly 
produce a single-bit Boolean value result. 

14. The apparatus of Claim 13, wherein the numerical execution 
15 means comprises: 

integer execution means for comparing two multi-bit integer operands: 

and 

floating point execution means for comparing two multi-bit floating 
point operands. 

20 15. The apparatus of Claim 7 further comprising: 

numerical register means for holding integer and floating point values; 
numerical execution means for executing numerical comparison 

instructions, wherein execution of each given numerical comparison 

instruction, 

25 i) retrieves two or more multi-bit numerical operands from 

respective numerical register means specified by the given numerical 
comparison instruction. 
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ii) compares the two or more numerical operands according 
to a condition specified by the given numerical comparison instruction, 

iii) produces a first single-bit Boolean value result according 
to the condition, 

5 jv) stores the first Boolean value result in a given one of 

said Boolean register means as specified by the given numerical comparison 
instruction, 

wherein the numerical execution means includes, 

i) integer execution means for comparing two multi-bit 
10 integer operands, and 

ii) floating point execution means for comparing two 
multi-bit floating poim operands; and 

Boolean execution means for executing Boolean combinational 
instructions, wherein execution of each given Boolean combinational 
15 instruction, 

i) retrieves one or more Boolean value operands from 
respective Boolean register means as specified by the given Boolean 
combinational instruction, 

ii) combines the one or more Boolean value operands 
20 according to an operation specified by the given Boolean combinational 

instruction, 

iii) produces a second single-bit Boolean value result 
according to the operation, and 

iv) stores the second Boolean value result in a given one of 
25 said Boolean register means as specified by the given Boolean combinational 

instruction. 

16. The apparatus of Claim 7, wherein: 
the plurality of Boolean register means includes, 
i) a first set of Boolean registers, and 
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ii) a second set of Boolean registers; and the apparatus 
further comprises 

means, coupled to the plurality of Boolean register means, for selecting 
the first or the second set of Boolean registers as a currently active set: and 
5 the means for storing is responsive to the means for selecting, to store 

results into Boolean registers in the currently active set only. 

17. An apparatus for use with a data processing system, the data 
processing system including means for executing Boolean instructions, each 
Boolean instruction performing a given Boolean operation upon two or more 

10 operands to generate a one-bit Boolean result, the apparatus comprising: 

a Boolean register set including a plurality of individually addressable 
one-bit registers; and 

control means for writing the one-bit result of a given Boolean 
instruction into one of said one-bit registers, the one one-bit register being 
15 specified by the given Boolean instruction's contents. 

18. The apparatus of Claim 17, wherein the Boolean instructions 
include Boolean combinational instructions, each Boolean combinational 
instruction specifying a Boolean operation to be performed upon a first and a 
second operand to generate the result, and specifying a first address of the first 

20 operand and a second address of the second operand and a third address of a 

destination for the result, wherein: 

the control means is further for reading the first and second operands 
from the Boolean register set at the first and second addresses, respectively, 
and wherein the one one-bit register is specified by the third address. 

25 19. The apparatus of Claim 18, wherein the means for executing 

includes means for executing plural Boolean instructions in parallel, wherein 
there may exist, in the plural Boolean instructions, data dependency between 
one or more slave instructions and a master instruction, each slave instruction 
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having the result of the master instruction as an operand such that the slave 
instruction cannot be executed until the result of the master instruction has 
been generated, the means for executing further includes means for delaying 
data dependent instructions until their dependent data supplying instruction is 

5 completed and its result is generated, and wherein: 

a prespecified constant Boolean register of the one-bit registers has a 
predetermined constant data value which does not change upon the control 
means writing another value to the prespecified constant Boolean register; and 
the control means is responsive to a master instruction whose 

10 destination is the prespecified constant Boolean register, to immediately read 

the predetermined constant data value for supply to the slave instructions, 
whereby the means for executing is enabled to execute the slave instructions 
before the result of the master instruction is generated. 

20. An apparatus comprising: 
15 execution means for executing instructions, the instructions performing 

operations upon operands to generate results, each instruction specifying a 
respective source address for each operand and a destination address for the 
result of the instruction, each address specifying a register set and an offset; 
a first register set including a plurality of individually addressable 
20 registers each for storing a value of a first data type; 

first access means for writing and reading values to and from the first 
register set according to a given instruction, the first access means including, 

i) first reading means, responsive to the given instruction 
having a given source address which specifies the first register set as a source 

25 for an operand of the given instruction, for reading the operand's value from 

the first register set at the offset specified by the given source address, and 

ii) first writing means, responsive to the given instruction 
having a given destination address which specifies the first register set as a 
destination for the result of the given instruction, for writing the result's value 

30 to the first register set at the offset specified by the given destination address: 
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a second register set including a plurality of individually addressable 
registers each for storing a value of the first data type; and 

second access means for writing and reading values to and from the 
second register set according to the given instruction, the second access means 
5 including, 

i) second reading means, responsive to the given 
instruction having a given source address which specifies the second register 
set as a source for an operand of the given instruction, for reading the 
operand's value from the second register set at the offset specified by the 

10 given source address, and 

ii) second writing means, responsive to the given instruction 
having a given destination address which specifies the second register set as 
a destination for the result of the given instruction, for writing the result's 
value to the second register set at the offset specified by the given destination 

15 address. 


21. The apparatus of Claim 20, wherein: 

a given instruction may specify a first and a second source address and 
a destination address, with each address specifying either of the first or second 
register sets such that the given instruction requires access to both register 
20 sets; and 

the first and second access means operate simultaneously to provide the 
instruction parallel access to both the first and second register sets. 

22. In a data processing system, which includes a central processing 
unit (CPU) which performs operations according to an instruction, the 

25 operations operating upon data of a first data type, a data register system 

comprising: 

a first register set including a plurality of first registers each for 
holding a datum of the first data type, and including means for accessing the 
first registers in response to the instruction: and 
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a second register set including a plurality of second registers each for 
holding a datum of the first data type, and including means for accessing the 
second registers in response to the instruction. 

23. The data register system of Claim 22, wherein the instruction 
5 includes a field specifying which of the first and second register sets is to be 

accessed in response to the instruction, and wherein the data register system 
further comprises: 

means, responsive to the field, for accessing the first register set or the 
second register set as specified by the field. 

10 24. An apparatus comprising: 

integer execution means for executing integer instructions, each integer 
instruction performing an integer operation upon one or more integer value 
operands and generating an integer value result; 

floating point execution means for executing floating point instructions, 
15 each floating point operation performing a floating point operation upon one 

or more floating point value operands and generating a floating point value 
result; 

wherein each instruction specifies one or more sources from which its 
one or more operands are to be retrieved and further specifies a destination to 
20 which its result is to be stored, each operation also optionally specifying an 

integer value base and an integer value index; 
a register bank including, 

i) first register set means, having a plurality of first 
registers, for holding integer values and floating point values; 
25 access means, coupled to the first register set means and to both 

execution means, for, 

i) retrieving, from any one first register, an integer value 
operand for the integer execution means, a floating point value operand for the 
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floating point execution means, or an integer value base or index for either 
execution means, as indicated by an instruction, and 

ii) for storing, into any one first register, an integer value 
result from the integer execution means or a floating point value result from 
5 the floating point execution means, as indicated by an instruction. 

25. The apparatus of Claim 24, wherein: 

the register bank further comprises second register set means, having 
a plurality of second registers, for holding integer values; and 
the access means is further for, 
10 i) retrieving, from any one second register, an integer 

value operand for the integer execution means, or an integer value base or 
index for either execution means, as indicated by an instruction, and 

ii) for storing, into any one second register, an integer 
value result from the integer execution means, as indicated by an instruction. 

15 26. The apparatus of Claim 25, further comprising: 

Boolean execution means for executing Boolean combinational 
instructions, each Boolean combinational instruction performing a Boolean 
combinational operation upon one or more Boolean value operands and 
generating a Boolean value result; 
20 the register bank further comprises third register set means, having a 

plurality of third registers, for holding Boolean values; and 
the access means is further for, 

i) retrieving, from any one third register, a Boolean value 
operand for the Boolean execution means, as indicated by a Boolean 

25 combinational instruction, and 

ii) for storing, into any one third register, a Boolean value 
result from the Boolean execution means, as indicated by a Boolean 
combinational instruction. 
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27. An apparatus, for use with a data processing system which 
performs read operations and write operations upon data values of a first data 
type and a first data width and upon data values of a second data type and a 
second data width different than the first data width, the data processing 

5 system specifying a read address and data type for each read and a write 

address and data content for each write, the apparatus comprising: 

a register set including a plurality of individually addressable registers, 
each register being wide enough to hold a value of either data width; 

read access means, responsive to the data processing system performing 
10 a given read operation, for accessing the register set to retrieve data contents 

of a given register, which is individually addressed at the given read 
operation's specified read address, and for providing to the data processing 
system such portion of the retrieved data contents as the data type of the read 
operation specifies; and 
15 write access means, responsive to the data processing system 

performing a given write operation, for accessing the register set to store into 
a given register, which is individually addressed at the given write operation's 
specified write address, the data content specified by the write operation. 

28. The apparatus of Claim 27, wherein the first data type is 
20 floating point, the first data width is sixty-four bits, the second data type is 

integer, the second data width is thirty-two bits, and wherein: 
the register set is sixty-four bits wide; and 

the read and write access means respectively retrieve and store 
sixty-four bits responsive to the data processing system performing floating 
25 point operations, and thirty-two bits responsive to the data processing system 

performing integer operations. 

29. An apparatus for use with a data processing system which 
executes instructions, each instruction performing operations upon one or more 
operands and generating a result, wherein each instruction specifies one or 
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more sources from which its one or more operands are to be retrieved and 
further specifies a destination to which its result is to be stored, wherein the 
data processing system operates in a plurality of modes, the apparatus 
comprising: 

5 a plurality of first register means each for holding an operand or a 

result; 

a plurality of second register means each for holding an operand or a 
result; and 

switch means, responsive to the mode of the data processing system, 
10 for providing the data processing system access to only the plurality of first 

register means when the data processing system operates in a first mode, and 
for providing the data processing system access to only a first subset of the 
plurality of first register means and to the plurality of second register means 
when the data processing system operates in a second mode. 

15 30. An apparatus including execution means for executing 

instructions, each instruction performing operations on one or more operands 
and generating a result, each instruction specifying one or more sources which 
are to be accessed to read its one or more operands and a destination which 
is to be accessed to write its result, the apparatus further comprising: 

20 a plurality of register banks; 

each register bank including a plurality of register means, each register 
means for storing an operand or a result, the plurality of register means within 
each register bank being arranged in a sequence such that any one given 
register means within a given register bank may be accessed as an offset into 

25 the given register bank, wherein the sources and the destination of a given 

instruction are specified as offsets; and 

register bank selector means for selecting a given register bank into 
which the given instruction's source and destination offsets are applied, the 
register bank selector means operating independently of any contents of the 

30 given instruction. 
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